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fenguage ^testing, is one of the most pbvious ,ind important areas for ac^ 
tiyity in applied 'linguistics, vtt fit'? the jg^n.eral paradigm welj; tlt'e . 
problem is clearly language re l^ted^, artci the ^^olution must come from l-.in- 
guistics and from another di^feipline-«psychom9trics--as well. Each of 
' Jthe nvajor branches of linguist'ivCs-- theoretical psycholinguistic , and ^ 
s^ociolinguistic--has its owi\ sfpecial role^'^to pl^y, and each ha,s exerted 
. its influence on' tite development.Apf the^ field, ,The involvemeixt of psycho 
metrics is a necessity as w611. ''(^ood language testings needs to "be bashed 
on releyant knowledge, from applied lihguistics^^nd' from. psychome\rics . 

Given this, and consideT^ing the social' Relevance, of the field, it is 
appropr^.ate that this 'third series of ^dipers in Applied Lincfuistics be 
dedicatesd to cHironicling AdvanoGs. in Language Testing* The series, will 
first, survey the state of the art and then present theoretical, practical 
and tecfinic^l articles that record its progress. Each is^ue will hav^ a 
specific theme; the series as a whole is meant^to provide a means for 
continuing, communication, among ali those concerned i^ith language testing, 
whether as users, practit^oneirs, or theorists. * 

The first' three* fasicles have a special history wtiich should be men- 
tioned here. They include omginal and revised versions ofiaarticles 
cpmmiss idned. to appear in a volume intended to be called Current Trends 
in Language Testing. The original publ isher ' s\ di f ficul'ties left the 
manuscripjis in limbo fot" ^ome time, and the,, size of the enterprise dis- 
couraged ^alfher publishers fr^^m taking up the project. \^i^ten as ^ sur- 
vey of the st^tB of the art, they are a i^nocl star^ for 
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Intfoductron: 
Bnguists and Language Testers 

/One o# tfte distinguishing feg^ures of Westirn education in. the twentieth 
century has beeii the 'emphasis' on testifig. | Formal gbjective testing has 
come to be considered one *of the most critical steps in the modernization 
of education. In inany countries, the' confll ict between traditional '^sub- 
jective" examinatiohs. and the newer "obxec'tive" standardized tests is ' \ 
still a central issue for professional and public debate.. Concurrent^ 
with the growth ^of test iitg, there has deviilopefl a body of professionals 
trained and qualified in educationaT m^asijirement , These testers, whose 
field is called psychometr ics , 'tend to find their basic concepts and ' ^ 
techniques in psycho-logy, in general, and| in , educational psychology /. in 
particular .^ This perhaps explains their special concern with the question 
of how to tes.t: 'treatecl often as technicjians , they tend' to assume l ess 
fesponsijbility for ^deciding ttrhat* to -l^est While- this is true o^ testing 
•^n most fields., 'itv tj^e area of language testing the situation is fuhda^r 
mentaWy different;,^ language testers are much more likely ti^ b^ ] iti- 
guists^and .thus subject-matter specialists, tti.an thfey are 'to he trained 

riraari^ in educational 1 measurement. The. tendency; is clearly illii3- 
trated in this volume*, where most of t"he. contributors, of them pm 

fesslonally involved with, language testih^ to »^ great exr^t . would 
consider themse 1 ves (1 i ngu i ^ rather than psychologists. It, will he Pho 
purpose of this intVoductign to attempt to f\ud some 'expl;^nat io^> for thl^ 
trend, and ^ to exp lor e the special rea*;ttn^ why npp'^ied linguists? find. 
lnngua$5e testing -==0 fruitful an activity for tlieir rcsof^rdi ancf ptnrr\^(f. 

It is useful » t-ltough an over -geneTn 1 ' zat i^^n , to c^v^de Mnjjuage 
testing itito throe major trends-, which I will label the pr 'e?rsr i r»n.t i f"^** , 
the psychometric struCurn 1 i s t » ?3Tf' the i nt ogrnr i vo - soc i M ngui s-t i r . \]\o 
tr«-nds fc»llow in or^^cr ''ut ovrHap in »ime nnd r^ppro^arh. The'rh'rd i i^ V-? 
up mnny ''Elements of the first, and the socoTv)"ajid th'V^ co exist ^^^'] 
compote. But the cVude c 1 i f i rn t i 011*= wiM provide f*T??mowork ^ f.h'^. 
C!i*^sion kj\<i some T\ot ion of progress \n t)\e f«€^d. i ^•^-^^--^ 

Th^ t^'re- ^-cient i f ic por;iod (or trend, for it-srin holds swan in many 
parts-'of the worlds mny ho chclrnct '^r L zed h>- l?ick concorn Fo.r '■t-i^ist' 
caUmatter's or for such fictions ns oKj ^ct i \' i t y nnt^-. re 1 i ah H i ty^. In it<^ 
simplest form, it assumes that one can \nd must rely c^my^l et^^ly on thc^ 
judgment -of an exper fenced . teacher , who, can teU after a few minutes* 
conversation, oj- after reading a studont*s es^aS*, wha*t marlC toTf, i ve T In 
the pre-sc-ientific mode, oral examinntions of any kind were the exception: 
language testing was assumed\to he a matter of open^ejided wr i tten exam'ina ^ 
tions- Depending on the language toachitag ph i ] osophy , such exa/ifiinations 
w.ouWil consist of passages for trans lat^on .into pr f rom . the foreign lau- 'i. 
feuage"; .fre^ cbmposiiion' in it;-* and selected, if^^ms of gr^mmaticals textual, 
or'^gultuVal interest. During^ this per i od , -Anc) in this approach, ^language 
tests are clearVy thq business of language teachers, or, in more formal ^' 
'situations, of languagfe teachers promoted or ^^^pociaHy 'nj-jpointed as exam- - 
iners. No special expertise^ is rcfiiiirodr if n person "Vnows how to tn^K^h. 



it is to be assumed that he can judge the proficiency of his students. 

, The hext period, however', Sees the *invas^ion of rhe field by experts. 
The psychometric-structuralist trend, though hyphenat3ed for reasons' that 
wiM become apparent, is marked by the interaction (ai^ conflict) o^ two 
sets of experts-, agreeing with each other mainly in th^ir belief tliat 
, testing can be made precise, objective, reliable, and 'scientific. . The 
first of 'these groups of experts were the 'l^esters, the psychologists 
^responsible for the dej^elopmen't of modern theories and - teihniques of 
. educatifonkl measurement. Their key concerns have been to provide "objec- 
tive"/rfte^sures using various statisti|cal techniques to assuive reliability 
and certaih. kinds of .val idity . Their ^irst thrust was to demonstrate the 
\ unreliabi\ity of traditional examinations, and studies such as those of 
Pilliner (u952) and others on the marking of essays showed how unreliable 
^ subjective scores^ can be. This "done, they moved to develop- more reliable 
Imeaswres , working to find either techniques for making judgments more- 
reliable or new kinds of test items more amenable to control. 
- The^etter known .work of the testers was th^ development of short 
item, multiple choice, "obj ect iv^e^^ tests. The demands of statistical 
measures of reliability and validity were Seen as of paramount importance: 
\ ''Firstly, the shape of all tests , ^hether predictive or non-r.predictive, 
language or non-language,' is primaTily determined by the need to test the 
tests for reliability and validity. That is why, for instance, the mul- 
^tiple choice tgchj>ique of answering is so common (Ingram, 1968, p. 7,4) /' 

There wea|Hpt.wo results from this emphasis. First, te*=ts like this i 
required wrixr^n respons^, and so were limited to reading snd, 1 i s ten i ng . 
Second, the i tents^^i^eeen did not reflect newer ideas f^bout language 
t^aching-and learning. The testers and psvchologiiBts added "scientific' 
techniques to language testing, hut left a great nnmbe-r^f deficienries 
Writing in, 19^2, J^^hn Carroll nuot^s wi*h aprTr{i al * hp crit'ci^^m 
guage testing that p.-h^^ft i -t.if* i-n^J T"-^t » i*-* th''' '-'r^^MMo»> i,; ^ 

t '^^raj, thes i ^ : 

t 

A number of%^ronc 1 us i ons a'o 'eached. T^>r^ arr (1^ tliat a ereat 
lag f^xis*^^ i '* a *^ n r emeu t in F'^^H'^^i a <; a ^'^rcigri la'^guage, f^* 
that fhr* lap i fonru^rfTl witl^ n'»^^. c i f^n t ' f i v i n^'^ F lanqn?ic<^ 
{7*) that <5r'prue "f I an 'ire h'^'il ^ ^" m«:o< \u finin % 

to t ea rh . \h' ^ ■ ]v p ] i n ■' |m c i ' ' f y I " c n t i - 

1 i ne;!! i s t i > » » ' ' ^1 < r 

{ f -1 » r - I 1 . >^ • ^ 

("a r » ol 1 c on M r »Ti«^ F.nH ■ ■ ' j nHgrf^on t " '"d n 'H^' ' tha^ a • a t 1 q p - i c: i c; \ i^ 
all ff-re'gn Inngnnj^n !nf>nc;''r emen t " T h l n p Ti ' <; hown tip f i »'s t 4 n * p 

t H'l)' of f or c 1 1 ^1 !i jjvng " t ' li i n p c r\. r t i f>(l n t Aga r d ai> ^ pi iti k o 1 '1^4^^ 

tho on I \ tpc;^«=^ nv'>ilab>o wo'o ritt'Mi t t f ocal^n 1 a ry ♦ f^a Ung. and 
g r amma ^ ^ 'uid noT^p we r o g • ^ r> " ■ • » w> p t 1 1 ^ -i i > » ■! i ("^ra I s ^ ^ ' I " a t \. » 
nothing ^o be e"*pl^a *^ 1 " e<l , 

Tl^e <^*^cond mj^jor iTnp*^tn^ of thr " \ e^^t i ^ \ r" t'eriod, or approach, 
then, wa'^ wbon a mow <;*^t PxpeTt*" added notiot^s from the sci'^'^ce 
languagp to tlm'^e from th*^ srienc^ of ed>«cat i oT\a 1 ?fte^sur<^ment . On^ 
scholar wh<^ has 'Straddled the two fields for mo'^t of his careipr i ^ohn 
B. Carroll whoso parly (1^40) nnd r ecen t r { 1 P ' ) work alike *^how }i i ^; c-^mi 
cem with psycholoaica l ly and 1 i rlguisticAl ly valid ineasures of verbJ^l 
abilitips, whether in native or J oa mi pd N^ngnnges .^^^F'^arro 1 1 * s spec i a 1 
roje in thp development of lan^iaqe tcts lia«^ arisen from his ability 
speak as a fellow prdf ^s<; i ona 1 to both lingui^^t*^ and p«^\ chol og j st s , ax^*' 
his influence bas been widely fe't (c^. Carroll, 106^5?^, 1968b. 1972»). 
7 b ^ i np''^r t- apr ^-^ '* f b i i r r » \ i f i " ^ i 1 1 br ^ 1 p '> » i f t » ^\ \ n( q I * • » t 
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lists of references in^a voluiffi^ such as -this- . ' ' 

In reviews. of the state of the art wTiften in. 1952 (Carroll, 1952, 
I9S3), CaiwroU drew attention' to' Rol?ert Lado ' s "doctoral dissertation, ^ 
which marked most clearly the second stage of the "scientific"'' period, 
the adjlition of linguistic "^principles to language testing. The diss^rtai 
tion, written lDy Lado at the University of Michigan under, the direction 
of Charles C. ^ri-es, was concerned witji the construction of English . i 
.achievement te^ts for 'Latin-American stOjlents. Over the next decade, Lado 
refined his- notions of language testing, land published in 1961 a book that' 
is a classic exposition of the structural - 1 inguist ' s approacti to testing. 
It is not too much of an exaggera^on to suggest that^a great proportion, 
of work-in language testing sinceHLado (19^1) isVither based on It or is 
an attempt to ahswer or correct som*fe of the poijj^slt makes. J 

The point of major controversy has probably been the theory of testing 
problems. Lado choie to set the contrastive analysis hypothesis as one - 
of the central assumptions of his testing work, opening himself to criti- 
cism .of botli the geaeral theory '(cf., for example, Hamp, 1968; Di Pietrot 
1971) and its application to testing (Upshur, 1962). But- even this 
stresses the basic importance of Lado^ %^ork-, for he was both insiT^Tng ; 
on and jdemonstrat ing the relevance of linguistics to langu'age test Ing . 
He accepts completely the psychometric principj/es basic to testing, and 
explains them clearly enough for language- teat^ers and even linguists to 
understand, but he leaves. t\o doubt that linguists, with their understand- 
ing of the nature of l^angunge, miwit be the ones to set the ^peci fi cat ions 
for language tests. " — ^ 

Therc'was % the time stilTan easy congruence between American struc- 
turalist Views of language and the psycholo^H ica 1 theorif^s and pra«?rtacal 
needs of testers. On tHe theoretical side- both agreed that knowledge of 
language was a matter of ha^if=^; ofi thf> practical] sid<^, testers wanted, 
and structuralists knew how to deliver, l^ng lists of srna]^to|fems which 
conld he sampl'^d and te^^toH ^^h j oct Tvel> - The structural li^^ist's view 
of language as e^^-^ent i a t 1 y ^» matter of i r em • arid - a r rafige»^OTi t fell easily 
in^o the tester'^ notion of n s<^t of disrr.e^e skiMs to he mea-^^ured. 
is (<^nT^erf i <-ia 11 \' nt l^n^^t) not too h^f^ t/' ^imI'' oi- i t ^ . o in., i t j |M 

'••iwMce te«^t fT>^Ni '^trntUT''! ^ramip^r. / 

The mar>ia#F o^ rUn two f\^U]^, rhou . j^rO'Jdcd *]y<? ''a«=is for tfio 
fl'Miris^Mrig of ^he st^»'dardi^^ K'UUMintjr tc^t with i t special pmph'»^M<: 
on wh-t Crirroll MOr>l^ 1 ^HeTT'-d r»'o ^Mi -rrto ^t-uctnrp I'oint" i r cm . Me 
notorl thn»^ "Ihr» u-ot^ of I ndo ntlvr langnn^p te^Hn^ ^porinli<:ts has 

rotrrrtly p'^into-J to tho 'ps^r^^liMtv of te^rir^g fo» vpt^ <^p'-<-ific' i ' em^ 
of language 1< »»ow i ed t!*" nrul ^ I- f 11 '"',M:i amp 1 ed fr^ ■ M^c nsn-Uly 

pnormoii'' pool of r^^siMe • * ' = ^ '-T^^TdTii t-ijj ri»i^ va^5'^ 

f M nf? J('arf?> 1 1 , ' 9M } . " y 

^^ a ro-Mili of l.arlo' w^rk, hf>rh Inr^giiaii'^ M^rl^ei ^ nrul lin^ni^t*: Iwv' 
full a'"ce^^ to th**^ field o^f lanj^iinye t9«;rin^. Tlvro i rin important 
do^reo of^odes'v >n h i <; aj>pw>ficti, f^>r 'be accepts the to^^ter's right to 
establish kinds of t^sf^ nn^i methods of jnd^Vjg validity and reliability, 
even whi^le insist ino on tbo ro^po^^s i b i 1 i t > of the linguist to^rdecide whnt 
Ls^ to be tested The three m^nror hooks ^^n "hanpnagc felting since thon 
(V?3lette, 1967; Harris;. 1960; nnd Clark, 197?) v;hare a great nnmher of 
his assumptions. They . course^ tn ^mjiJias i Valettc applies the 

principles to languages other than [-.nglish, Harris aims to be concise 
and practical, I and Clark emphasizes psychometr i rs , but thoy a^c all 
largely withiri^^he structuralist psychometric trA^. 

■Phe major ap>iicvement of this trend ha^^ proh.-fhly been the production 
of a number of w^ 1 1 -des i gnod . ^^thndard i zed tests, such as those ndminis 
tered h\' pfhirational T'estir^fT "^rrvice. * Mio Ctndnx^fp Pe(j:or<1 P v .wn *, m'> ^ i r>r) *^ 
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Advanced Tests in various languages, the MLA Foreign Language Test^ for , 
Teachers and Advanced Students, the Test of Englfsh as ,a/;/Foreign Language, 
and' the College Entrance Examination Bpard Achievement Tests in various , 

-languages are all good-quality tests in this tradition, widely and confi- 
dently, used to measure student progre'Ss and progra^ success . Similar 
tests are. n^w available in Great Britain arid parts of Europe', and the ^ 
notion bf objective testing and th^ principles on which it is based have 
now ^pr-ead throughout the world* Of greatest importance in this- develpp- 
m'ent has been the possibility of tests that can be -used efficiently wi^h 
large numbers of subjects ovex a wide geographical area. The Test of 
English as a Foreign Language, for instance, is now given four tim^ a 
year at 112 centers in the U.S. and 260 overseas. • ^ ' \ ' 
The structural -psychometric trend has not, completely over^come the 
objections of the traditionalists, »who continue to")Feel that less specific 

''measures are still of great value- They have ther^efore been instruraentjil 

. in the development of more reliable jnethods of judging* the mojbe *6ubj*ec- 
tivev kinds of perf ormanje.^ The'^fii^st was concerned with the judgment of 

"written proficiency. At the same time that some scholars were showing 
that '^objective writing*' tesrts, which usual ly invo Wed multiple choice 
items, correlate well with other measures, others were pointing out the 
kinds of techniques of shorter essays i scoring guides, >wd* multiple 
judgments t^at add reliability to subjective marking- It was shown, then, 
that the •trMition^L tests, with their obvious face validity*, could be 
improved. -^Ine second effort wewt into the problem of judging oral profi- 
ciency, a skill least satisfactorily handled by the objective tests. 
With all its ifnportance, speech production remains the hardest to t'e^t . 
"The most difficult problenjs arise w^en trying to construct tests of 
ability tcf speak a Hang'uage, ... Suffice it to say that although the ideal ^ 
of a test based on free conversation is very attractive, the problems of 
sampling and reliable scoring nre almost insoluble^ unlesii a good d^al 
of time and many standardized expert ' testers are available (Perr^, 1968. ] 
115)." wlien a \50od deal of.time and expert es t ex's are available, some- 
thing very good can be done, as is shown l5y th^ Pare i'j2n SerV^ce Tni=titu<"e 

/testing "t-echni que*^ descr^^ed by Randall Jones ^9^^) . Th^'p^^blem with 
thc^-e tests tufn«^ rtUt t'^ be ^not theoreticnl bn' practical : p'-say "y^^^^s 
nnd irftervi^v t<^sts ^an be nvftde quire reli^ihlo and objective, hot it i 
oxpen<;ive to do so Tlv siipporte^ s of dis'^ rcte I'^^omw t^*^ ^ ' -'^^^in. . 
1»-ivo ef f i r i onc\' 'a s well t}^*"Ory on their sid^. • ^ 

There »ve , hovvevc, bja^n i T^cren*^ i T\g 1 \ <;tTong nttn^k<: on their 
princiT^les, -^s^'ocia'eH w i tilt two trend^ in contemporary 1/i tipu i st ic? . T\\o 
fir<=t, which i wiJl cn t 1^ i(h^ lanpnage corripet ence trond^ is connected to 
Viirio^'*^ ' iew<=,of p<> r'hr> 1 i ngu i st «^ . It i b^^^^'^d on n brlief l^\ such p 
thrn^ ov'^rall Ifm^nnge r^rof i c i oru-y . '^nd f^clin^ tlmf knoU* 1 edge' of 9 
language i^ m-'v^ tbnt> Jus*" thr- sum of a set nf discrete parts. . flie 
second, winch \ will call the commixri i ^ a t i v e comp^-tence trend, i-s connected 
with views of mOc^ern sqc i o 1 i ngu i s t s : it ^^ccepts the belief in .'i ti tegra t i 
testing, but insi'^t'^ on the need to a^-lH a <=^trong fiuictioTini Hin\ei>sion 
language testing. 

The i*^sue was first raised clearly by Tarroll (1961). Afrer he h^ 
described the role of discrete structure tests, bc/^vent on to nrp^ue tnat 
they fail t^^r^eet a number of basic criteria for the measurement of l^ 
guage I^owledge. He stressed* therefore, the neo^ for what he called 
Integra t ive app^voach , " where one pays at tent >on ^>ot to spec if i c st ru<;' - 
tural or Toxical iteras^ but to the "total commun i ci^at i ^• effect of an 
utteraiTce . *' Such^an approach has several acI^T^fl^t ages : an integrative 
test ,is broader in its sampling and less likely to be tied to a particular)' 
course of trv^ninfi.' th o Hi f f j cm 1 1 v of febo task involv^'d i more easi-lv ' ( 





r^elated to a subjective standard, and it. focuses on the general questi(3h* 
of , how well a learner is functioning- in the' target language, regardlQS^' of 
hisVoWn language background. ^ ^ ^it7 ^ ^ ) 

, Carroll is thus the first to argue for :what w^call the '''integrative^ 
sociolinguistiC trend: he refers in this 1961 "aper 'not: just to inte- 
gyratiive testirig,^ but >*to ^» communicative ef f qict", and ''normal, comfhlini cat ive- 
situation,'* and others who deal with the- piroblem '(cf . 'SpolsKy., . 1965; ' * 
^ Oiler, 1973a; Jkkoboyl^^/ 1969) axe^ clearly, indebted to him- '\ , ' 

I h^o discussed' the- basic principles of the language competence ; .or 
psycholjngiri Stic; trend in some detail in other papers (Spolsky, 1972, 
1973); they are also dealt with by Oiler and Ingram' ;in this volume. - - 
Brifffly, the argument gofes something like 'this. While 5truct;uraJ .Ain- 
guistic theary held that knowledge of ' language was a set of Jiabits*,' wifth . 
, the consequence for tes'iing that it is possible to select a sample of * • 
discrete items , ^^contemporary linguistic theory emphasizes ^ther the 
creative element* of language, the infinite, nature of /the s'^t of ^Jqssible- 
sentences, and the- incompleteness of gramimars attempting to- charact eriz e 
knowledge qf ^ Ijanguage. This change of theoretical v^ew challenges 
the linguistic validity of discret>e it^em tests. But there is a second 
^ re^vant f a'et ^about language, either derived JFr am the statistical theory 
^ of cNanunication or seen as part of a' pragmatic grantoar: thi's is tHat 

knowledge of 'a tangiia^e necessarily .requires the d:bilit-y to'function even 
when t^^re is reducipd redundancy , , making use of wh^at 0,1 l^r cal^jA ^ 
.expectahcy gr^^minar... Two major techniques are proposed t,o handle \this: 
the cloze test [Holtzfnan^, 1967 (and Oiler in thi's volume)], or a mfedifiec} 
form of it (Uai^nell 1*968) , or the dictation test with (Spolsky et aJ^.V' 
^1^1968) or without [Oiler, -..l?/!) added noise. Thesfe jilrocedurfes , it is 

ipaygue d, h ave the reiriability ,of other obj-ective measures, and their ' 
■. Uj eff iciWcy and ease oiF 'adbifitst ration, and, in addition, the stronger * 

V validiJ:y prWide^ by the'theory behind .them. The proposal s . are new, Nancfc-, 
/ will n^ed. dynsiderable "evidence; before? thdy are acCept^H; there is no^ ' 
J ^mention o^'the cloze procedure in Lado (1961), V^iette (1'9_67), Harris 

(1969)-, or even Clark ^ (1972) ; and the first three^are far from endor sijng^, 
<lictatioti. * But they clearly demonstrate on^ of-the; key elements in tl]^ ; 
pr'Q^nt trend: linguist ^ unashamedly pgresentyig arguments, on T^^ycb^>- 
linguistic grdurtds ^lone, ' t> the nature of languag*^ t^^ts. 

. The second part of the trend is conceTf^eH with the need t'^ t^^st 
communi cat4ve competence. This trend 'S, in paT t , reaction to t}\e 
f a'i lure of fhomskyap 1 jngui st i c theory to handle^thj? full complexity of 
language use any h'^ttqr t'f^fin the theorie'^ it »ppjaced. *=^oc i o H ngin st i c 
emplmses are clear in Cooper ()968) -and, with a.6lightly different, entpb?^ • 
sis, in Jakohovits (Ic^GP"): the principle? are illustrated in the te-^ts 
wsed by Cooper' in the Jersey City ^tudy (Tishman et al., 1^''), -^nd Xhn^-.^^ 
^ described by, -.spolsky et ai. (^972) and by I.el^enston (19/S). The key 
^ . af'guments involved in the soci o I ingu i st i c trend are twofold^ one simpli 
fying and the other complicating the process. Fir*;t is the'^notion that 
. Hcnowing'^a language involves being ahle to use it in certain circumstances 
N whatev^er the specific items a speaker caw control, it is his, overall . . 
> ability to perform with it that counts. A subject must be tested for 
jjiis ability to commun i cat e i n a Riven situation. In a test of Navajo- 
ferlglish dominance, for instTance, we assumed that any six-year-old who 
^ could answer the traditional Navajo question, '^What is your clan" could 
J^e considered a fluent speakef of the language f.^polsky et al., 1972). - 
^ The complication is, of course, allowing for the knowledge of different 
* varieties and the ability to handle t?f^ in different circumstances. 

The^variqus approaches ta test irig •tha t 1 have been discus-sir^g aro^ ^ 
conside-red in more detail in other g,)i^pters in thi.s volume. Mv main pur- 



pos^' in this dntroductioh, has been to 'show how'it is that, Hnguirsts fiave . 
cpm6 to consider that ' language'>testing ^s 'well within th^ir proVinc^. 
The general Jin^. of developmenk •^has* been something ' like this . /Origijnatiy, 
testing was simply^\ teacher functi(\n, although many people belieyeci^ a ' 
teacher's jiidgment aj^gSmatical ly improVed when he changed hats -and. w'as' 
identified as an exalfflmer. Next, experts on testing moved! into^h^ field 
with their principles . It wa/k sooi) shown that ' psychologists alone could 
not develop good language tests : some linguists l^ke Ladq showed that 
the job needed to 'be- shared knd» to, depend on two ^B^^ of expertise. 
Finally, a gfoup of psycholinguists and Sociolin^HJs , with somewhat 
imperialistic notions, are yfetarting to claim, the fTel^ completely for 
themselves. Language testing , th^y seem- to be saying, is too important 
to be left to language testers. 

•Even though this may be an exaggeration, we still need to account for 
the way in Which language testing is such a congenial* field' to linguists, 
whereas othdr kinds of subject matter testing is usually left to testing 
experts. (Phe answerUies, I believe, in th^e fact that linguists consider 
the. question of knowl^cige of language to be central to their concerns. 
They are all the *tim^ trying to characterize in a grammaA what It rtreans- * 
to know a language: ^ it is thus quite reasonable for some^ 1 inguists to be 
interested an measuring knowledge of language, Upshur suggests, /'Trends / 
in second- language /test ing tend to follow trends in second- language 
teaching, and in t^e United' States--at leas^t in recent times--trends in ^ 
second- language testing h^ve tended to follow trends in- I in^ui sties ^ 
(Upshur, 1572, p./ 435).*^ 

I believe on^ can state the- positi^qn even more directly. As language 
testing has com^ to be a field for linguists » it has become open to 'direct 
influence from ilevfelopraents in linguistics. Because linguists in the last- 
few years have been concerned with describing knowledge of a language,^ 
and, now, knowledge of language use, language testers have been able to 
draw on their theories for practical implications regf^r&img how to measure 
such knowledge. I.anguage testing has thus become on^ o^ the most fruitful 
areas in which linguistics may b*^ applied. More,'' it h-^s' ^'er-om/ one of 
the area's wh^re "he tHfv-?n^'e of Mn'nT<^tir t^^t rr c 'n h? nick?) te"t6^ 
in prac t i cf> , ^ * bp ' ■ r ■ > ) t - . • r ' '■ ' , ^ , 
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There are a great many divers^ methods in use for the assessment of Ian- 
guage -command. They range from translation of literary passages, writ- 
ing essays, dictation, etc. to choosing the Correct' alternative from a 
set of multiple choice items, tQ .be answered at.high ^eed, each testing 
some highly specific point of'phonology, syntaoc, or vocabulary. 

. The psycholinguistic bases for these practices are^not at all-clear. 
It 'does nqt help -that the delimitation of the term" "^)sychol inguistics" 
is not-very clear either. In this article I shall first attempt to deal 
wit'h .some of the interpretations of the term psycholinguistics , and then 
try to ^relate these to traditional and jecent resting prac^tices. \ ' 

PSYCHOLINGUISTICS . . ^ 

There are 2 interpretation^ of the t;erm psycholinguistics (in fact, 
there are sfe'veral, but resolving into a dichotomy- of one versus all the 
rest). The first is highly specific, closely linked to generati\?e lin- 

'gSJstics, and owes its origin to Chomsky (1965, 1968). For some people 
this is the only acceptably use of the term. For others^ li ke "Carrol 1 *, 

• who, I believe, firs-t used the term in print. (J9.S3) , psychol i ngui sties 
"is simply ^ word used to cover ;*ny area of joint interest to psycholo'- 
gists and linguists, regardless of theoretical orientation and degree of 
f o rma 1 i t y , 

sl shall, discus? •'first the Chomskyan interpretation. As Kuhn (19^2) 
, has pointed, out, a science (or a movement Vfrdthin a science) consist?^ not 
only of a set of theor ies ^nd, of techniques ^and methods for discovering 
and describing events, but also of a set of -beliefs ancf att i tudes about' 
the proper way of regarding the \vhole ^enterprise — about the true nature 
df the objee-t under study and Ihe correct way to approach it. Tn ""tran^- 
formationg^l linguistics and psycholinguistics, thfese a priori ^attitudes 
are particular ly important.. / ' • ^ * • 

In Chbmsky*s view, a -language cohsists oV^u non-finite set of well- 
formed sentences. It is the job of the linguist to describ'e the univer- 
sal of language--the essential but abstract categories and relations 
which constitute linguistic deep structure--and to relate thdm to the i 
structure of actual sentences In a language, as they appear on the sur- 
face. But further,\for Chomsky the ultimate aim of linguistics- is to 
contribute to the study of the human mind. A grammar 'must therefore not 
only be descriptively adequate, it must also have explanatory adequacy: 
it must explain the processes that underl/ the functioning-of the "native 
speaker-hearer," ^Thi§ obviously gives psycholinguistics a very central 

■ ... , 
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place., ^/Jnd for ChQjjsk^, wh^y^i?art i oulaV^ly i^haract.erises the n^ive ' 
speaker is his grammSiti^iaJ^ntui tion ati^ his Janguage creativity. Be- 
-cause he has intuition, he can distingrtiish grammat iWl sentences fro^ un-' 
granimat-icar cmes, and because he is creAtive, he. can. janderstapd and pro-" 
duce sentences which he has he\?^er heard ibefore. TTiese are not >j;Ae(rrtelf 
irf the -strict s^nse. They are deeply held 'convict ions *about the essense' 
of things--conVict ioite whiph^det ermine the areas of ixiterest and the me- 
thods of approach. ^^^^..^^^^^im^ / ' d 

Chomsky also has ve^sJi^rong convictions about the form^thar a theory 
and a description of language mu^t tajce. The thqory must be generative,'-^ 
that isf it must result in -a ddscriQtion which is explicit. This in turn 
means that fV must be of such a nature that"^ a logltal machine could, 
gi^T^C the elements and the rulfes, generate only the wejl-formed or gram- 
matically-correct sentences of the l^n^uage and none'of the ill-formed 
ones. 

Through very closely argued reasoning, starting from these premises 
Gamong others), Chomsky arrived at the conclusion that the desired axi- 
omatisation can best be achieved by employing 2 spts of rules^-phras^j 
stroicture ^ules to account for the deep structure and transformational 
rules to lihk up the deep structure with fhe^surface structure. V^^a^e 

^ here |n the area of linguistic theory proper^ There has been a gi^S^t 

'-db^l of work done within theoretical gener^ative linguistics, and most of 
' it has been devoted to exploring the posslbi 1 i t ies of account ing 'Tor Ian 
ijuage in terms of these 2" rules.- • . 

Psyckiol inguists within the generativ^ framework hav^ accepted both 
thomsky's presuppositions about the nature of language and language use 
and his specific linguistic t|^eory. They also accept one further char- 
acterisation of native SMaj4ts - -that they possess a language faculty i- 
which consists of a compe\ence component and a performance component. 
JVhen native speakers distinguish well-formed sentences from ill-formed 
ones, they do so by virtue'of their own competence- The aim of the 
lin^i<;t is -to descjri be , th i s nat iye- speaker competence both in termc, ot 
certain grammatical categories and in terms of phrase strucT:ure #jleb 
and transformational rules. It is these categories and rules that the 
native speaker must know- -tacit ly or intuit i vely-- in order to function 
as a native speaker. This is an extremely opaque area. The native 
speaker must in some way have access to this knowledge- -otherwise it 
might as w^4 I not be there--but the competence component is in no way 
active. It is the performance compon^t which- underlies the production 
and recognition of actual sentences, and which is also responsiHu^e for 
tTie errors and shortcomings of actual language use. 

The acceptance of the competence/performance distinction and of the - 
characterisation of competence in generative terms led psychol inguists 
to concentrate on 2 areas. The fij-st is' sentence processing: how is it 
that native sp^kers can effect the conversion between the deep struc- 
ture of sentences, which is where the basic meaningful syntactic rela- 
tions are given, and the surface structure of the sentences of everyday 
language, where the essential i-elations may be obscured in a variety of 
ways? For instance, in a sentence such as, 



\^The tall girl I spoke to just now is German. 

w^.^Hnow that the noun phrase the tall girl is both the subject of the 
matrix sentence ("the t^ll girl is German'*) and the object of the in- 
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r serted ctaws^ spoke to just now**), but there is nothing in vii 

^y^j^^^igntence which Qvertly marks the odaJect function. /(.In this particular 
* H^t^nce it ceui be marked overtly insert Ing^ the relat ive who, but 
that vs not the point.) , 

• By way of attack ing J this problem, investigators concentrated on 
transfonnat ional ly- related sets of sentences. In some experiments they * 
measured how long it took subjects to inarch up sentences which belong^ed 
. together in a transformational set^- - John liked the o^d woman ; T!ie old ^ 
> woman was liked by John ; Was the ol4-woman liked by John ; The old " woma,n 
^ yasn't likc^j^by John , etc. --when such^ sent ences were s'c4ttered among ones 
belonging to other transformational sets (Miller, 1962). In otheir experi^ 
ments subjects were asked to^memorise sentences to see if th*^e rec^uiring 
more transformations co generate their surface structures w^^ misremem- 
bered as ones requiring fewer transformations. Since forgetting usually 
involve'5 simplification, this would indicate that transfor-Aat ional com- 
^ plexity indicates psychological complexity, and hence would_.support the 
theory of the transformational nature of competence (Mehler, 1963) 
The second ar^a- is that of child language acquisition. Accord 
Chomsky, children are born with an innate knowledge of language 
sals. The process of language acquisition is one of learning how to 
match up th^se inns^fely-given universals with the surface structure of 
whatever language children happen to be exposed to. So the study of 
child language--before they have learnt to use the transformations which 
result in adult sentences and while they are learning- -<^uld provide em- 
pirical . veri ficat ion of the postulated universals, . 

This -specific interpretat ipn of psycho 1 ingu i s 1 1 c s has undoubtedly 
been the most dominant one In recent years, both in the sense that a 
great deal of work has been carried out within this framework and in the 
sense that it has been this approach which has made the most impact on 
.the world. Though it will no doubt continue 'to be very influential foi 
- ^rae time to come, this theory, irr»its pristine form, is probably not 
noy held by very many. active workers in the area of psycho 1 i ngu i stic*^ . 
Th*bre are a number of reasons for this, but it is chiefly due to the 
changes that are going on within generative linguistics itself*. 

Two fundamental aspects of Chomskv's standard theory have been vigor 
ously Challenged by some of the youngeV generative linguists. For in^ 
^ ^^'starfce, in Chomsky's view, the central component o'f Jjjnguage is syntax, 
with semant ics" and phonology as secondary, or "inte^pciptat ive , compo- 
y .nents. McCawley and ot^hers reject this, $^eing the semctntLc com- 

y ponent as the central generative component. And if the deepest struc- 
J tares are semantic, then the status of a separate and distinct deep syn- 
tactic level becomes questionable and, in any case, much less importj^nt. 

the last few years it .has been the nature of semantic structures 
which has engaged attention, and while sentences, of course, continue to 
be studied, a great deal, of both linguistic and psycho! ingui stic research 
focuses on words. Li/iguists concentr^ute on the autonomous semantic struc- 
ture of words / while psychol i^iguist s- -and others who do not necessarily 
think of themselves as psVchol i ngui s t s in the restricted sense--seek to 
account for the processes which permit the language user t^ perceive 
words, understand them, $tore them in memory, and retrieve tlidm y^hen 
wanted. - . , 

' The other basic view which is being challenged is ^hat language as a 
^ wnole should be regarded as a self-contained system, to be described and 
.explained without reference to -any of the vagaries of actual language use 
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or to diversity among language users. According to this view, the mean- 
ing 'of sentences is fully detenpin-ed by reference to language relations- 
and language^ elements exclusively. . fhe critics of this view argue that 
the full interpre<at ion of sentences ofl^n depends on ^he listeners' 
"knowledge of the world/' that is, on thd^ knowledge of ext ra- Lingui s- 
tic facts-. For instance^ when a British d^tor says, "I'm sorry, I can 
n6t hel|r you; you must go to your own doctor , "uth i s can be propet^ly un- 
derstood only if one know* that there is medical rule in the United 
Kingdom that ^ doctor can treat only'those patients who are registered 
with him or who have been ,sent to him by tfie doctor with wham they are 
re'gistered. ( 

Nobody denies that "knowledge of the wbrld" ^ters 'into ihe mider- 
standing* of language. The point at issi^e^s whether it is so imp€)1-tant 
that it must ^ incbrporated into *1 inguist ic tTieoi;^. The critics claim 
that it must be. md if i^ is, then, of course, language can no^longer 
be treated as a self-contained system* y 

This issue has not been resolved, and the rami fi'cat ions it produces 
extend i%ito the psycho 1 i ngui st ics^o f the understaffd iog of words. There 
is a fairly commonly-held view trat the semantic structure of words is 
to he accounted for in terms of a partially hiera'jrs^ical ly-ordeced set 
of features. For instance, the word woman c onX a in ^ t h e features ani- 
mate, hu/nan, adult.-fenuale, neutral with respect to status t etc. If one 
holds the view that language is an autonomous system, words have abso- 
lute meanings which are rhe sum of the i r • f ea tures . And to understand 
words and to jdistinguish the meaning of one word from all the others, v 
the language us$r must in some way call up the whole bundle of features 
which characterises each word. This again amounts tq implanting the 
description of the linguist into the head of the spea^ker. 

Another commonly-held, and differing, view (for instance, for word^ 
designating objects)_^is that semantic features are a subset of percep- 
tual features, and that the number of semantic features that need to be 
invoked is variable^ depending on the setting and on the intention of 
the speaker. Thus, a white, round block ^11 be des ignat ed' the white onfe 
if it is among round blocks of other colours, and the round one i f it is"^ 
^raong wh^te blocks of other shapes (Olson, 1972). And when "the listener 
tries to identify the objfcct which the speaker designates, he has to call 
up only those features which, in a given situation, are sufficient to iden 
tify the object, not the who le * bundle . Olson goes on to argue^that the 
meanings of words are bu-ilt up* slowly in tfhe native speaker, through an 
accumulation of experiences in which the various perceptually-distinct 
features in turn become distinguishing. (The implication is that words 
have partially di f f erent mean ings for different people, whicji will' not 
surprise any non^ 1 inguist . ) \^ . * 

The intellectual energy which forNsome years converged on thfe develop- 
Iment of a single set of presuppositionrs has now become diffOsed into a 
ivuniber distinct, though related, approaches within the generative 
rramework. At the same time, other approaches- - some new, some tempo- 
rarily submerged- -are beginning to be heard agaiq quite generally. 
These approaches are very diversej^but in nonh is language treated as a 
self-contained system and in ajl meaning is regarded as the base com- 
ponent. Some of thes"6 treatments are more or less directly derived from 
general psychological theory. For instance, Sk inner ' s (1957) account of 



verbal behaviour is a straight extension of his general^ theory of ^earn 
ing. His approach is uncompromisingly communicative: the categori^a- 
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tion utterai^'es in terinjj of what ttic^ speaker is seeking' to achieve 

Osgood's (1957a) psycholingui^stic approach is similat-ly rooted in 
straight psj^cho logical the?%j^y, being a sopfvi^t icated version of '*neo-be- 



haviourist ic** m^iation thed' 
culiarly emotional opposition 
a considerable influence .both in thed 
semantic dif ferent^ial technique and 
feature^analys is has been wide'ly used 




nl ike Skinner , who a<rouses a pe- 
ny psycho l^gui sts , Osgood ha-s had 
tical and practical cycles, l^is 
speitific version of a semantic 
in research on bilingual ism (see 



Jakobov-rbs^ 1970) ! And his ^ogni tively-based communications model 
(Osgood, 1957^) wa:> adopted as the theoretical basis of a widely-ui>ed 
diagnostic test of the processes underlying children's ability to use . 
language (&^e""^f^>^ 9J . 

- Cogn'i^Cive psvchoJugist:* have always b«eii ^^oncerucd with language be 
cause of its closp connection to cognitfive development and cognitive 
structures. ' With' the resurgence of intere:>t in cognitive psychology m 
the last 15 to 20 years, the influence of men like Piaget and Brun^r ha^ 
been enoAmous. On the whole, however, cognitive psychologists have tend- 
ed to treVt language and language utterances as a means of- studying spme- 
thing else\^i.e. thought processes, rather than as an object of study in 
their own r i^ht , so the influence of these psycho l^^^^t s has tended^ to be 
penvasive rath^>->4> luin suecX fic, But in recent years there has been a 
very irvterestlng convergence between some generatively-oriented psycho 
linguists and others who are Piaget « u rfentecj^ Studies have appeared in 
which the child's comprehension and use of language has been directly rc 
lated to his ability to caorry out certain operations, e,g, the ability 
to recognise when something is moxe or It^ss tn^wx something e'Tse (Donald 
son and Wales, 1970), or to reoogni:>e certain^ logical relations, like 
the agent / rec ip i ent relation (S inc 1 a i r-de - Zwaart and hlavell, 19o9). 

Chomsky's ideas and theories have made a great impact also within 
main- 1 ine^ psych61ogy . He has helped to reawaken the interest of psycho 
logists '''in language function, and all undergraauate curricula in psycho- 
logy now have a psychol ingu i s l ic otjmponent. But ^Here has not been a 
revo lut ion- - in Kuhn ' s sense of the term--in psychology as there mani^ 
festly. has been in linguistics. There are several/ reasons for this. 
One is that there are too many psychologists iti pne world, pursuing too 
many different aims. Another and more d i rect ly/relevant one is thatv 
CHomskyan psychol inguist ics has no learning th>^ory. If one accepts the - 
idea of innate knowledge, one is effectively /absolved from studying the 
processed of leai*ning and from trying to account for them. 

In soc io 1 ingui St ic circles the main dissatisfaction is with the con- 
cept of the '^idealised native speaker-hear.er . *' Hymes (1971), Labov 
(1969), and others do not accept as adequate the notion of the native 
speaker as a sentence-producing and s ent ertce- j udging machine, chugging 
away regardless of circumstances. They argufe strongly that, in ^ddition 
to accounting for how people construct sentences, it is also rv^essary to 
account for'^how they learn when and how to use them. 

• ^ • 

*I include Skinner in this survey because, whi^e in 1 ingdi st ic ' and ap- 
plied linguistic circles "frhere is a general impression that he was 
killed off some time ago, there is, in fact, a very busy and lively 
group of people working away on operant conditioning and verbal learn- 

*ing, quite hndeterretl by fashions in grtiier circles. And there are 
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. Anot*ier recent developTiient , speciticaUy relevuiiC to language test- 
ing, is the rediscovery of the virtues of pragmatics,. This is a general 
trend (following natutelly from a preoccupation vyith formal systems), 
butyift testing its^^fiief advocate is^ John W. Ollerr^J'r. C^ee pp. 39-57 
of this volume for a discussion of Wij views). There is^no point in 
se*ttring down the arguments in favour of the pragmatic view twice, but it 
; may be usefiil to examine briefly the relation betWeeiV pragmatic^ uwd the 

\^tious interpretations of ,psycholingui sties . 
\^ In the strict sense of the, te^, pragmatics dc:iive:j^ from ttio :3i4iay 

of fpn^ial systems. In such systems, syntax dekls with the strucLure of 
expressions, semantics deals with the meani[ig of Expressions (without 
refeiP^licp^ to anything out»ide the syst^nnj, and pragmat i cs ' i s concerned # 
wi»th the relationships between expressions in the formal system and any^ 
thing else outside of it. ,When Oiler (.197Uc) states that ^'pragmatic facet, 
of language are those having to do with the rel^tiorii^ between linguistic ' 
l^units, speakers and ext ral i ngui st ic facts (p. 9S),'' he is using the term 
more or less in its original sense. According to^ this definition, prag- 
matics is a superordinate term, covering any kinds of interdisciplinary 
aotivities in which linguistics ft ^nvol ved - - psychol ingui st i cs , soci\^ 
linguistics, speech and cpnftiunicat ion . studies , etc. But 011^ goes 
further; he asserts that lArtguage cannot usefully be Studied a.s a ^sel t- 
1 contained system, that ]>ragmatic facts must be built into the linguistic 
accoult itself. This makes claims about what lingj[iitics should be abouL 
and oH^y^ously rtns into a fair amount of opposition. 

'^p pragmatic facts Oiler is particularly concerned with are the 
cess^^of comprehension. For him, comprehension is not a matter of com 
putin^ compatible interpretations of sentences in a vacumm; what is im- 
portant is the^ expectations of the listener. language in-use is always) 
concerned with something; listeners expect that what they hear will make 
sense, so they match up the incoming signals with what they k\it5vi about 
grammar an4 discourse and the world. In a very informal way, this is in 
line with current cognitive trends iji jisycho logy wher^ percept ion' and 
comprehension and recall is thought ot as an active process where the in 
dividual matches his existing structures witli the outside signals he is 
receiving (Neisser, 1967). A 

It is important to note that 011ei''i> concern i,s with\.he cointnehen- 
sion of passages rather ^an with the interpretation of si>aglS sentences. 
This is the continuous concern in educational circles, since this is what 
a reader has to do in real life. The original competence/performance 
distinction was formulated in terms of well-formed sentences\ considered 
one at a time, and the utterances that matched, or did n6t match, such 
sentences, with no built-in provisipn for knowledge of the -wor\d or the 
Untentions of 9peake^*,s or the expectations of listeners. To the extent 
that we do concern oui^eTves with discourse rather than with sent-ences, 
and a-dmit the relevance of ext ra- 1 inguist ic factors, the applicability 
of the competence/performance concept, as formulated by Chomsky, becomes 
doubtful, if not irrelevant* 

The problem of relating surface structure to deep structure remains. 
The difficulty is'rhat nobody is very sure what deep structure is any 
more. It seems intuitively more satisfying to suppose that co-gnitive 
and relational notions such as agents* and actioas, a^sscnt and denial, 
and locat^n in space and time are at the base of language, rather than 
nodal l;j^Te 1 at ed noun phrases and verb phrases, but we are a very loim-. 
way from any kind of^ explicit model, oryeven one that is moderately^ < 
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• The current cc^nfeern ^^(^ comprehend ioji rtjlaLc:^ tu the view iJiat icx\v 

guage is tt) be regarded' as . a means of conu^uaicat ion ; /)o th in mothei ton-/ 
gue a«td ^econd language teaching. (This'Ts'^ i^rfcortant from tiie stand- V 
point or edi3K:at ional needs, ^s well,) One approach, to the problem of y 
cornprehensipn is to try to analyse the component stilld. Car.roll (19/fa^ 
evaluating research -tarried out on a high school po^latiun, suggests' 
that^the components may be lexical knowledge, grammatical knowledge, /the 
abilrty to locate *'facts^' in paragraphs (which presumably ijivolves kAow 
ledge o5 the rules of d i scourse) , . and thfe ability to mak^ inferencW^, \i 
i.e. to go beyond the flata giyen. He fliggests fuith<e^r that^the fir&V Sd 
may be analysed in terms of 'the actual language used, while the fourth v 
* ^>WOuld be some sort of general cognitive ability. Many would consider 

-^hat, this sort of approach Is not psychol ijiguistics . It depends on one's 
dPefinition of .the te'rm. If psychol inguist ics must have its root* in 
theoretical linguistics and/or theoretical psychology, Lhen it isn't 
But if psychM inguist ics^ is a,, general tei^m covering studies i.jnto. the 
human processing of language, ^hen it is*. 

PSYCHOLINGUISTICS AND TbSjlNG 

I 

The essential truth about nearly all kin^^ ot Lt?-*Li> i lUai/the only 
theory they afy4)a^ed on is test construction theory, ^hi^n is Jl kin.i 
appl^e^ statist\c5^ Current intelligence tests are not based on any ..o- 
herent or explicit cognitive theory; language tests are no,t based on 
any coherent or expl i c rt' psyeho 1 i ngu i st ic theory , iheir sble jusiifi 
cation is that they work, i.e. one can make better decisions on the ba^is 
^of th-e information that they provi(^tiian one could without-thai intor 
mat ion . 

Practices in language Le^^tin^ arc mMucneed by 2 things: b) pie 
theoretical views about the nature of language andVlanguage u^^e and. a^ 
Upshyr (19.72X has pointed out, l>y trejids i n teach ingypract ices , L.an 
guage teaching practices are in turn influenced by aVi'^^^^ of lactor^D. 
by econo,mic circumst;.ances (fur instance, during the dStpression in the" 
United States, only reading skill was aimed at" becauseX the plaiineis could 
hot assume that studjsnts would be able to receive moreVthan 2 years of 
language learning in school); by general educational trends, occasionally 
by linguistic, psychological, and sociological theory; bijt , again, per- 
haps most of ail by convictions about the nature of language. 

From time to time linguists have had great influence on language 
teOjChing. In the early part of the century, Jespersen provided a scho 
larly grammar of English whijh was also admirably suited for pedagogical 
purposes. Moreover, because^of liis views about language and' about life, 
he was a vigororus and influential advocate of what wquld now be called 
the situational approach. v 

The structural linguists of the 1940^s and 50's werfe again highly in- 
fluential, both in terms of getting across their views on the nature of 
.language and ''in terms of linguistic description. The view that spoken 
language is primary led to a considerable increase in emphasis on spoken 
skills, vyhLch in turn led to the construction of tests for spoken lan- 
guage. Th/ee-quarters of Lado * s (1961) pioneering work-on language test- 
ing is devTOted to the description of testing formats which deal with 
spoken language in s^omo way- Curricula were drawn up in terms of lists 
of 'Specific syntactic structures, selected and arranged according to the 
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prevj^ling g^rammat ical descriptions. Thi^ fits irt^admitabl^ witti inulii 
ple'^oice tesxfng techniques of .the sont that are^now :&ometimes referred 
to, as dis«.ete-point testing. It was unSer these influences that lan- 
guage testing^became a flourishing business and an important part of lan- 
guage teaching technology. , ' ^ 

When it comes to generative linguistics and generative psychulin 
guis*tics, the impact on c^la$*sroom teaching trends has been minimal. - 
an examination of rfe'cent second language textbooks shows how little of 
any consequence has been contributed by the theory of transformational 
gramroar^itself ^ to the devel^opment of teach#ig material -'(LamendeUa, 1969, 
p. 270) This no doubt sounds paradoxical in view of the tremendous 
amount of debate and discussion and persuasion that has been going on. 
What has happened is that . teac.hers ^ as usual, have been selective. Some 
of Chomsky's termiryology and some of his views about the nature of lan- 
guage have been enthusiastically adopted. The notion of "creat i vi Ly'\ 
^has been accepted as central, v^ie concept tends to be used in a general 
liberating sense, reiFerring to^Slie marterious complexity of language and 
the untrammelled capacity ot the native speaker to exploit its resources 
It is not> in general, thought of as having any particular or restricting 
psycholinguistic or pedagogical imputations. "The intuition of the na- 
tive speaker'* is a*liandy phrase for replacing the^wkwaxd "Sprachge- 
fUhl . " "CompetenceV is popularly proclaimed to be the aim of langua^^e 
teaching, but it is difficult to see what practical consequences this r« 
formulation has had. The ide** of a separate and distinct ''faculty de 
langage," not subject to the same principles of development and learnuig 
as other human capacities, has, however, caused much confusion and un~ 
certainty (cf. Carroll, 1971). What is a teacher to do in the face of 
this mysterious faculty, particularly when "the es^eniiai Cdtegvjries and 
relations of language are said to be innate anyway? 

We are back to the absence of a psycholinguistic iln^KJiy ot Icamiji^ 
In oversimplified preseiTtat ion:> , hahit has become a dirty word, but no- 
thing workable has been put in its place. Appeals have been made to 
rulo-hasGd orr aognitiv'G learning, but thib, in Carroll's phrase, is 
merely ''a kind of verbal overlay" which tends to add to the confusion, 
since it is in contradiction to the idea^of the separatcness of the 
"facult6 de laiigage.** 

At tJ^ level of theory, as distmct from pie theoretical cojisidcra 
tions,.»tne preoccupation with transformational rules has had only limited 
effects, and none in the classroom. Jacoboyits (1970) was biave enough 
to ofjFer a specific suggestion. Though he thought there was no theore- 
tical basis for imitation and repetitions, he suggested t^t if ^ercises 
are to be givei;!, they should provide practice in carry ing tr^ns for- 

mational conversions between different, but transformationalTJ^^elated, , 
sentences: . / 
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From a theoretical point of view the development of gr/tmmatical 
competence should be facilitated by getting the leari^^r to per- 
form a set of transformations on families of sentences (e . g . : - 
I cannot. pay my rent because 1 am broke; if I weren^t broke I 
could p^ my rent; given the fact that 1 have no, money , I can- 
not pay my rent; how do you think I could possibly pay my rent 
if I am broke; since I am broke the rent cannot be paid; to 
pay the rent is impossible given .the fact that I have no money 
(p. 106). 
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► This and Other similar suggestions have failed tu t>e conviii. 

The*reason why gfenerdt ive L^iiguis-t ic theory and generative p5> < n., 
linguistic th^?ory has had so .little impact-on IJngu^ige teachitig prac 
tices is that -teachers hay^, • rightly or,; wrongly,' failed to discuver^an/ 
relevance in ^ to ^eir Work. There h^ve i>een no empiiical consequence, 
that are o^f any gre^t practical value . ' Thece have been atten»^ts to teach 
languages through some sort of -trans forifiat iona 1 approach'^te gn Huthecford^ 
1968), but they have met with no great success; teachers and theoreticians 
dislik|P them equally. There is, however, one field in language teaching 
where transformational grammar could become impottant, -It dw^uds on 
whether the te^j[ltative return to being willing to g'ive :>ome gramnjatical 
explanation in the cldssropm will g^in ground. An eciectii^ pedagogical 
grammar would be certain to contai n^trans forma tional accounts of selected 
grammatical, areas (see Allen and Widfaowson, 1975), 

Though no t rai\s format ional influences have readied lan^ua^i^ ic^iin^; 
via the teaching si tuat ion there ha^ been theory-motivated ^lempts t>> 
try out transfx^rjiiational- type tests directly. When Pimsl^ur started dS 
sembling his L<anguage Apt i tudc;. Battery (1966), he tried out a subtest re 
quiring skill rn embedding sentences ttien referred to ai» double-ba^sed 
t rans foi mat ions . The students w^re pil&seiit ed with pairs of sentences 
which they had tu transform on tlk^,| lilies of the model: 

John i. 1 ci 1 1 t J 

> * * John C 1 u 1 m:3 is 1.1 ^li t . 

Juhn li right \ 

The sub test failed to shOw an) ^^i. r rt^l uti^n with :savte.. ^ i i ...jn^ i 
eigu langLiages and was eliminated very early. 

bimilar at tempi s to derive t.e:>ts directly ti 
have not proved workable. Briery (1^72) incoiporaied a transformational 
subt<jst in the ^^eries of tests wlfiich were developed to test tht: profi 
ciency of American Indian childr^ji. The children were give** a t^imple d.^ 
clarative sentence and were aske<^ to transform it into a negative, an 
imperative, or a c{uestion. Neitljicr the Indian children nor the control 
group of native Ivngl i sh - speals^ing chi Idrcn had much success with it. And 
in the administration of another subtest, consisting of having ».hi^dren 
repeat simple sentences » the in\^i^st igators found no lelationship between 
the transformational complexity 9f the various sentence patterns and the 
children Vs ability to repeat correct^ly. 

In contrast, the Illinois 're:>t of P^sychol ingui;^ t i c Abilities (Kirk ^ 
et al-, 1968), which is explicitly based on Osgood's communications mo 
del and empirically resoarched^fi the usual psychometric way^ is widely 
used as a diagnostic tool with^cHi Idren showing various kinds of lan- 
guage or developmental de f ic i enc ](es . This presumably is because people 
who have to make practical decisloi>s about individual children have^ found 
that it has practical value. The test has also been used with normal 
children, and various subtests hdve been found to correlate with reading 
skill (Newcomer et al., 1975) and with language dominance in bilingual 
children (Zirkle et aK, 1974). lAlthough it is claimed that the test is 
theory-based » when one examines the various. subtest s , they seem to be 
based on very broadly-conceived notions of numan functioning, rather 
than on any very specific model. This may have someth ing to do witrh its 
usefulness in dealing with complejx skills. ("Tfie subtests are concerued 
with memory for auditory and visu'al signals; the ability to check truth 
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vilue of spoken and writteti smtcHionls. wiih letercucc to knowlecfgt pf 
the world and to pictures; the ability ib su^jp ly ifcbs ing words- in ana- 
logical frames, to describe pictures, ^nd to show llnowl edge of vocabu 
lary items; as well as the ability to deal with more formal aspects^ 
language^ spe^ljling simple wOrds; supplying missing letters in words 
suppl jin2 ^the coDg^e^t inflections of nouns and vexh/.^ and choosing a| 
propriate comparatbrs, prepositions, and anaphoric pronouns.) • >^ 

Oiler's pragmatics urvflerlies h^s very active and succesjsful advo 
cacy of the u'se of clo^e tests for test ing . foreign language proficiency, 
as well as of ttie use of dictation. Both technique'^ represent very in- 
teresting and promising new departures in testing. ITiis might ap^ea^r 
to contradict the- openijig, statement in this section tjiat tests are noL 
based on theory> but 1 regard 01 1 er ' s ' pragmat i cs not ias a theory in the 
strict sensA, hut as a conviction about the nature of language ajid Jan 
guage use- -and/such ^Convictions do influence testing. 

The emergence ^f^cloze test:> ajid dicfatiort tests arc I of the h.uiu 
nev^ features of lanlguage testing They represent not only pragmatics, 
but also the new interest in "globaT^ or '^integrat ive^' techniques of 
testing, as distinct from discrete-point testing. Thii) agaln^der i ve:^ 
from convictions about the nature of language: if language learning is 
to be regarded not as the masteriiig of a series of grammatical struc 
tures or transformational sets, but as Learning how to communicate'^ef ^ 
fectively, the^i perhaps testing should elicit *'latiguagc behaviour" 
rather thaji '^language- 1 ike behaviour.*' 

Discrete- point testing has been stioai^ly ciiticiscd in i,ojiko ut iJic 
recent testing literature. ' There have even been suggestions thut ^^uch 
tests should be done away w i th al Logetl»er . In order to discuss the coji 
troversy, it is jiecessary to return briefly to a consideration of test 
theor% *k ' 

lesis arc interesting ujily when l\\^y woiK, thai Is, wficn they ac 
ourately jiieasure the character ) st iic:^ we wish tlieni tw measure That 
brii^gs us to the knotty problem of validity and' cri terion medijures. How 
do we know that a test works? The standaid answers are: because wc get 
high correlations with total .^s-gre::, on test batteries made up of a num- 
ber of different subtests or b'ecause wa get high correlations between 
test scores and external estimates of the characteristic we aie inte- 
rested in. (Stiictly speaking, th^is only pushes the problem one step 
further back, L.e, how do we know that the test battery or the external 
criterion is itself valid? But that is not my concern in this article.) 

By way of illustrating the problem, let us consider 3 tests: (a) a 
sound discrimination test (Ingram, 1968J; (b) a dictation test (Oiler), 
with revised figures given by Rand (1972); and (c) a test of oral com- 
munication (Pa.lmer, 1972) > All 3 tests involve spoken languiage, but are 
otherwise very different. For the sound discrimination test\ student s 
have to match a single word recorded on tape with one of 3 wrrtt^ words 
and this is neither integrative nor language- 1 ike . The dictation test 
is derived from a fami 1 iar c lassroom device, and is integrative but not 
exactly reflective of language behaviour. The test of oral communica- 
tion is an extremely interesting (and recent) development foreshadowed 
by Upshur's (1972) discussion of the need for such tests. The exanjiner 
and the student look at strips of 4 or S pictures. In one version the 
studdnt asks* quest ions lintil he identifies the picture the examiner has 
in mind. In andther version the student has to describe. the picture he 
has selected until the* examiner can identify it: The scoring is- either 
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time -tieeded- for identification lime pius-error count. This 
both iH?itegfat lie and elicits language behav^ur* , 

^ Vhe first requirement of a new test is "^nd always has been that it 
must correlate highly with the total ''scorp on a^battery of^ssembled sub 
tests. Most bal^teries contain a mixture ^ discyete-point a*r?(|^n tegra*^-- 
tive subtests^ /tor instance, grammar tests are Citeual ly' d^i acreTe-point and 
comprehension t&5-t:> are integratii^. Sample correlation^ for the 3 tests ^ 
cited above, . with their respectjiye test-battery totals, are: 
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SoiJiJid di ^1^1 I jhinution ' 320 .85 

Dictation ' 81 .84 

Oral cominuxiica t ii^n (^.ii^ le^t) 33 ,79 

All the total^ Im^ludc a granuiiai and a com^^ehti 
Other type^^f subtests, so in ail butik intej 
elements are piesent. 

The consequence is that tho only new,tc:Dib wlncli buivivc, wlieUici 
discrete or integrative, aro those wl^ich work in the same direction a:^ 
the cumulat i v^"T\^tal of bath integrative an^ di scretc po i nt results. ir 
tljere really is ajKl ear- cut distinction between integrative techniuucb 
and discrete-poTnt \pchniqucs, then by our procejdures we dellberal&^ly 
fuii% the dis t inct-iorf^ \. 

A neV .test shpuld also t^oi^^elatc wil,h ooiue extciiidl ciiteiion Bo 
cause of ihe nee^ to place overseas =»tudents appropri a t^j ly in univcisi- 
tics and colleges, grade point -averages (CPA) are often taken as such a 
criteri^fi. The correlations oblJBPned between any subtest or Lest total, 
whether integrative or d i sc re te- po i nt , with GPAs ure extremely low. 'llii:^ 
is obviousl^y because factors ottier than command of tlie language whi<^l> is 
the medium of instruction enter into academic success. So GPA is not a 
good criterion for estimating how good a test is as a measure of lan- 
guage command. A better criterion is to be found in tlie judgement of 
experienced teachers. Kanl^ing lists produced by teachex:> who kii^w tlieii 
classes well probably constitute the mpst valid criterion availablo, 
according to. Ver-nuri (i960) (provided ranking includes only one group* 

or class at a time). Part of the validation of the Hngli^h Language 
Battery (Ingram, 1970) consisted of correlations between subtest and 
total scores and teachers* rankings. The teachers were asked to rank 
for all-over comman^ of English-,| In the table below some of these cor 
relations a-re, set out for 2 discpete -point tests and 2 integrative testis. 
Because of the vagaries of figures relating to small numbers, I have, 
quoted results for 3 groLH>s, all of wh'ich were made up of young adult 
students. (The ranks were converted to z- scores before they were cor'- 
related by the product -moment formula.) 



Subtest 



Sound discrimination 
Listen Lag comprehens ion 
, Grammar 

'Reading comprehension 
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discrete- point 






38 


. 78 


integrative 


.43 




70 
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^l^^J®^^ nothing in these figures co *?ugge^:St that there is any in« 
trinsic^'dif feren«ce between discretfe-point t^ests and integrative tests as 
regards the am^liat of. ^igreentent obt*kined between test scores and the es- 
timates of teachers. It is', oi poursNe, true-^hat these subtests and 
their i^msoarere ones whitsh survived d^vera^l stages of item analysis and 
correlations with totals,. bU? that is true ''of any proper ly- researched ^ 
test. It is, in any case, fluite unnecessary to^ suppose -that one has to 
make an either/or choice, tftat^^^s.-^L^rOie approves of integrative tests, onc'^ 
shouM therefore disapprove of disc^te-point ones. This "disjunctive 
fallacy," as Carroll calls it, startfs, it/seems^to me, from nHsunderstand- 
i^s about the nature' of language W^nanand . * 
- Firstly, a test seets to measuV^accuratciy a giyen^ characteristic 

in an individual d?awn from a given populdrtion. ' Th^, most obvious way of 
finding out how good a person is at doing something-; il> to take a job 
sample: if you want to Jcnov/'t^ow good somebody is aft^fiWi^it ing an essay, 
set him to write an essay! \ But as is we 1 1- known^^duJSX^o a number of 
operative factors, the results may well vary: peopi^ Jifrit better essays 
at some times than they do at others, and, just as Wpc^^jtant, judges judge 
better at some times than they do at others/ Job sample's are inherently 
valid, but tend to be unreliable becailse of this ya\v.aj3il ity , thus lower-,, 
ing their validity/ This is why .multiple choice testitvg came into exis- 
tence: multiple choice tests areyiighly reliable when properly con- 
structed. But they do not necessarily possess inlierent validity; in 
language testing they may, for instance ^'^el ici t 1 anguage- 1 i ke behaviour 
rather tham language behaviour/ So the /v^l idi ty , or lack of vaTiddty^ 
of such tests has tp be empirically demonstrated by coipparision with 
criterion. This, in my opinion, also holds true fo"^ job-sample^ests - 
Once the validity has been demonstrated, howevei', it is immaterial what 
type of test we are dealing with. If a test works for the purpose it 
was intended to, then that is all chat matters. 

Secondly^ though we have no g^at understanding ut Lhc nature of lan- 
guage processes, we at least know that they are very complex. ' It is 
therefore highly unlikely that any single type of test wiin reflect all 
the facets of that very intricate human faculty: language command. For 
any full assessment, as ^distinct from quicic screening jobs, a number of 
different types of subtests are more likely to give aii accurate "picture 
than any single measure and , within limits, the more diffiXLTTT^thc sub 
tests are, the greater the chances of sampling language behaviour ade- 
quately. 

A PEDAGOGICAL APPROACH 

Testing is an educational method. For a very sraall minor^y it is a 
subject of study and research in *bns own right, but imjfefie wider context 
it is a practical tool, ancillary to teaching. Most teachers are prac- 
tical people--they have to be- -and- on the whole they do not find highly-^ 
abstract theoretical models very useful. '"UTorder to be useful, an ana- 
lysis of learning--a description o^ language- -must relate fairly straight- 
, forward ly to actual srtuations, to diree^y observable and rec.ognisal3T^ ' 
dimensions. There is no implied critici'im here ^of either tochers or 
theory-makers; I am nierely stating a truism about their different preoc- 
cupations and purposes; 

If testing is ancillary to teach'ir^g, then it must be accounted for 
* ^ ori .the same? basis as teaching, that i^^ in terms of a hot too' abstractly- 
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formulated model of 1 e a m i h e >^ I ^ ^ v view-. It is not necessary or useful 
•to set up a model for language ^earning which presupposes that language 
L learning is essentially different fronW^ll^, other Winds of learning. But 
/• it is essential to lock for an account wfcy^ there are 

distinct, though interdependent, 'le"arningpi¥otey^is . T Relieve this to 
be true at any level of abstractness , but certa^ly, at'' a practical le- 
^vel, it is the only way of making sense of the many .hjgghly diverse teach- 
\y ing and testing practices which actually occur, and wvicjh actually work, 
when appropriately employed. The most useful account of this .sort that 
I Know of is Gagn6'^s (1965) analysis of the conditions of learning, . 
which is specifically aimed at educational contexts. I have elsewhere 
attempted to show its relevance to the se^^ond language learning and 
teaching si tuition (Ingram, 1975). I shall not repeat the arguments 
here, but, by definition, if the account is relevant to . learning pro- 
cesses, it must also' be relevant to testing practices (insofar as they 
work) . " 

Gagn6 recognises a number of types of learjiing, all hierarchically 
ipelated. The least complex and most basic type, is a very simple form of 
perceptual learning-- learning to recognise and distinguish recurring ob- 
jects., and events. This .underlies all other "forms of learning and is 
generally difficult to exemplify in a pure form, because most of it 
happens in earl^ childhood, and other and more complex forms of learning 
supervene almost immediately. But the process is very clear in second 
languagfe learning. For instance, in order to differentiate, 

entendre; attendre [c^t^drj ; [atadrj 

one must xiistinguish nasal ised ftotw non- nasal 'i sed vowels, and iji order to 
di f f erent iate > 

cent vents/ cent vlns [sSl vAJ, [ i>a vC\ 

one must be able to tell one nasalised vowel from auothei*. 

Gagn6 makes it quite clear that there is no disjunction between this 
kind of very basic leaVning and the more complex forms, such as concept 
learning. Concept learning is essentially a matter of learning how to 
categorise partially different objects or events under one heading be- 
cause they possess certain criterial cha.racter4st ics , e-g. Al sat ians ,and * 
ti^rriers and dachshunds ^re all dogs, or because they are functionally 
equivalent income ^ay, e.g. guns and knives and arrows are all weapons. 
There can be no concept learning of this sort unless perceptual l^earning 
is secure, i.e. unless we have learned to identify and distinguish ob- 
jects and events in the first place. ^SimilarJ.y, the ability to use 
words appropriately depends* on a series of conceptualisations: catego- 
risations of non- linguistic objects and events must be learned; semantic 
categories and syntactic classes must be respected. The objects that are 
categorised bay be more or less abstract, but ultimately all concepts 
• that have empirical reference relate back to the perceptual world and 
perceptual ieai^ing. ^ 

Two other forms of learning which Gagnfi recognises are chaining and 
problem- solving. Chaining is that form of learning which enables us to* 
produce as a smooth sequence. an activity ^which has several comp(^hent 
pSirts. For instance, verbal Chaining enables us to produce and recog- 
T\i.$e a phrase or ap utterance as a unit, to operate rules of agreement 
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and concord .without computiag each* element separately , to get word-order 
right, to produce<the more s|ereotyped utterances of social jexchang^s, ^ 
axid^to fit everything into^ t» appropriate ^in^onat ion. stress contours. 

In problem- so lying learneTfe - havj^ to^ structure a gjl^jj^ask so that 
they can ^ake decisions about relevant concepts, and P2|6^w|re.s . When- 
ever i%arner$ are asked* to induce a givejn grammatica^^^H^ th^y .are" 
^'"Y^^?^*^^ prodii^eVroblem-sol^j^^g behaa^cj^iOur , " ProblflJU^n be difficult" 
in'2'w^ys, either^ttecause t^e nec^^ss^ry conceptual ^ti-Ucture ii coiriplW ' 
or, quite often', because the r.elevja^ht concept just does not occur tOvj ' 
people. It is, for instance ^ 'fairly difficult *for najtive speakers of 
j^ermanic languages to i'hduce ,unaided the rule for the' use of Jhe pp3^'es^'*^ 
sive son , and §^ i» French; the fact that it is t^^g concept' of grammati- 
cal g^der,, which is r^el.^v.ant axid,^ t^e gender ©^Hhe word^ fujictiontng as 
objec^at ""that , seeqjs initially very strange to""stich speakJirs. ' 

^^ave mentioned 4 of Gagn6's forms of learning-^-perceptuai identi- 
fication, chaining, concept learning, and problem-solving--aiid -in the 
barest possibly way indicated the -link's with certainU anguage learning 
phenomena. Now consider them in relation to 1 anguag^^testing formats. 
Perceptual learning provides a direct rat ionale ' for tests of sound dis- 
crimination, Such as tfye one described on p. £0. ChaiViing -can be seen 
to underlie tjiose test formats which test the learne/' s * ab i 1 i ty to oper- 
ate the obligatory rule^ of laaguage, for instance ; those concerning ^ 
morpholpgy and those concerning the sequencing of elements, and also 
those formats which test the learner ^s easy ^recognition of pcedictable 
patterns and conversational sequences. Concept learning is obviously 
relevant to all aspects of language lise. In testing it is directly re- 
quired hy items which as^k the Warner to choose the £fcopropr iat e form in * 
light of a given context: theJlearner must catego ri se\ the . occas i on in- 
dicated by the context as being ah instance of a class )of^ occasions which 
call /Dr the selection of one language form rather than . another^ for in-r 

rfeqt rather tha,n any other tense' form/ 
dispejvsabLe element in tests of compref- - 
le his^- knowledge of grammar and yoca- 
d and of the rules of discburse, to 
nd the concl us ions and the impl ica 
o interpret . 



stance, the use of the present 
Finally, problen^ solving is an 
hens ion. The learner must asse 
'bulary, h is^ knowledge of the.wor 
enable hi^ to identify the 'facts, 
tions of the passages he is asked 



This approach is not as elegantly .simple as Chomsky's model (or 
Skinner's, for that matter). But itCii^ serviceable^^ and that. In a 
teaching/ testing context, is what matt^n?^ 
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Psychometric Cpnsidfem^o^ 

• idhn^L.D; Clark 



In the language testing context, the term ♦'psychometrics" can be most 
usefully defined as any and all uiilizatipas of numJto-ical data and re-^ 
lated logical operations in the sei;vice of developiil|, using, and in- 
terpreting the results of measurement actrLvities carJied out upon lan- 
guage learners or potential learners. In any -given measurement activity, 
^the psychometric proc'edures invtxlied are propeiM^ dependent. or\ the pur- 
>osS which the measurement activity itself is intended^ to serve ji and 
their appropriateness and adequacy are judged by the extent .to which they 
contribute to the accomplishment of the intended purpose. 

It is useful, in this regard, to define three iDroad cat.egories of 
- •'purpose within the language testing area. The first is prognosis, bri'ef- 
,iy described as the prediction of an individual's future achievements in 
language learning on. the basis qf, currently available measures of a Jin- 
guistic or.lpther nature/ A second measurement purpose is the evaluation 

\ of 'aahieveynlsnt , in which the interttr^s to determine the extent to which 
the studen,tjhas learned ("a^^quired,** "mastered," etc) elements of lin- 

' guistic conllent formally presented' in a course or other con'tro-ll^d leat-ri- 
ing situation. A third broad '^rfrea of measurement purpose is the evalua- 
tion of proficiency, that is to say, th^ determination of the extent to 
which the student is able to utilise the te$ted language for suth real- 
life receptive or communicative purposes as reading magazines or novels, 
conversing with friends on topics of general inte):es.t, and so forth. 
In proficiency testing, the manner" in which the measured proficiency^ 
has been, acquired is jiot at issue: ijideed, the testing process a.nd' test 

^content should >e completely independejit of the student's language learn- 
ing history. f ' ^ 
In vi^w of the extremely close relationship between the intended 
/ purpose of a given measurement ins.trument g^nd the psychometric concepts 
and procedures appropriate to it, th^ discussions in* the following pag:es 
have been sectioned according to the three categories of testing pur- 
pose identifred. Within each' section , the major concern will be to iden- 
tify those asflects of psychometric practice mosi: sui^dd to the develop- 
ment, use, and interpretation of the test instruments in question, and ^ 
to relate 'these to the format, content, and pragmatic purposes which the 
tests themselves are intended to serve* 

*^ 

» PROGNOSTIC TESTING • 
^ The-^asic functipnof prognostic testing in the language iearnihg con- 




text is to us current Jy available information about a student to predict 
the level of accomplishment which he oi^ she is likely to attain at some 
.jFuture time, after having followed a particular- language learning program 
or activity. ,The degree to which scores on a given test or other quanti- 
fiable measure, • such as rank in class or course grades, can accomplish 
this' predict ive purpbs?* depends^ on the extent to which these data corre- 
late, in a statistical sense, with achievement test scores or other 
quantifiable criteria of "success'* used at the completion of the leaxTiing 
program. The correlational relationship is usually expressed by means of 
the Pearson product -moment correlation coefficient, which ranges from zero, 
indicating a complete absence of relationship between scores on the pre- 
dictor and criterion measures, to 1.0, indicating a perfectly consistent 
relationship.^ The higher the correlation, the more accurate the predic- 
tion, in the sense that there is a decreased statistical probability that, 
a given pre|liction will be inaccurate. 

The development of effective prognostic techniques is thus, in large 
part, an attempt to fjind tests or other measures which correlate;ihighly 
with appropriate indices of (later) language success. Grade averages, 
rank in class, tests of general intelligence, and other measures generally 
available in student records have for many years served in the prediction 
of language learning success; these efforts have been reviewed by Henmon 
et al . ("1929), Salomon (1954), and Pimsleur et al. (1962). Predictive 
value has also been sought through more specialized techniques, including 
tests of musical ability. (Bl ickenstaf f , 1963), measures of articulatory 
precisipn (E. Pike^ 1959), and psychological profiles (Morgan. 1953)- 

Apt i tude Te.s.f' ' DGv^^lapment 1 

An Tntensive sear^^h for effective predictors of language learning abil- 
ity that could be readily and uniformlv administered to prospective lan- 
guage f^tudents wa« carried out hy John R. Carrol 1 , during the early IQ'^O's 
in t>ie context of th*=^ intensive foTeig" language dburses conducted at the 
Army Language Sch*^ol in Califoi^nia and a'^ oth^r government training <"enters 
(Carroll, 1962). Th** research t'ech'»iq'ie u<;ed was t<^ administer test bat- 
teries consisting of a lavge number of *^xp'^ r i men t a i tasks to students enter- 
ing the language learning rrogram*^ , ^nH to ^e\^c^ , through fact'^r analytic 
techniques, a mu^h /'maimer numbe' ta k'^ wb i ch r^'^served Tno*=t of the 

prelect ive power of the nripi nal larger ba'terie*= Tl^e major ^''►utcomc 
the f'arroll studi'^s was ♦he puMicafion of th'^ f'^r^rl^rn La^^^quag^ Aptitude 
T^s^t ( ^'arro 1 1 ^ an ^1 Sap on , ^^FiO), ^n<^i«=ting of ^ '"pa ra t " s • »bt.es ts en t i 1 1 ed 
Number Teaming, ri\oneti' S'r^p^, Spelling Clu'^s, Words ir^ .Sentences, and 
Paired Associates. F.a'h '^f the e subtest*^ was int*-nded to tap a comret er>^*^ 
related to the ability to I'^arn a frrreig»» lang^^age ^ i thout requiring t>i*> 
^tTiderit to be fan^liar with f^ny language other than Pnglish.'* 

Carroll's search for >>igh*>'r predictor criterion correlations (and her^ce.. 
more effective prognosis) was relatively successful. In the test manual 
for the AfL^T, Carroll was able to report correlations as high as .71 between 
MLAT scores a^id high scbool lang'^age '^oiirs<^ gr^^des , <^ompared to correla- 
tions of . to ,^2 for th*^ Ot i T 0 te<^t fiTiH ofber measures of general 
intelligence! 

Al thou^t" such results did represent an appreciable improvement in pre- 
dictive power, it should be noted that a correlation of 71 accounts for 
only slightly more than half of the statistical variance present in the 
criterion scores. The remaining '*nnpred i cf^d'^ varianr^p reflect*^ influences 
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not accounted for by student scores ~on/ the prognostic test; these may 
beTliypothesized to include differing l/evels of student motivation 
(v(hich» in some instances, coi^ld 'counjcer-balance a lower degree of 
intrinsic 'language learning ability) /tutoring or other special study 
opportunities during, the course of instruction,, and various other 
factors .A 



Carroll Model of SchdfOl Learning 



conceptual framework within which theN)ature and^inf luence of 
these "other^than-aptitude" variables might be empirically' analyzed 
was suggested by Carroll a number of yeai^s ag6 in his "model of school 
.learning" (Carroll, 1963). According to/ this model, a student's suc- 
^ cess in accomplishing a given learning/task can be represented as a 
mathematical function consisting of tfiq following elements: the stu~ 
dent's "aptitude" for the task in question; "ability to understand 
instruction," as determined by measures of overall intelligence and' 
verbal ability; extent of "perseverance," as indicated by the amount 
of t4me the student is willing to spend in active study knd pre3umed 
'to reflect the level of motivation; the "time available for learning"; 
and the "quality of instruction" provided. 

A powerful implication of such a model is the notion that students 
with a low level of measured aptitude for language study can, nonethe- 
less, reach an acceptable level of accomplishment if other variables 
in the equatj^ori are suitably adjusted-^for example, if more formal 
learning time is provided or if more carefully developed instructional 
materials are made available. These concepts may appear commonsensical 
to the practicing language teacher; nonetheless, their integration into 
the formal modej proposed by Carroll is significant in that it clearly 
postulates the contribution to be made by each variable toward a 
criterion of measured achievement. 

In order to validate the Caprroll model and render it useful for 
instructional planning and prediction, it would he necessary to quantify 
each of the component variables for experimental study. Detailed pro- 
cedurfes for gauging students' motivation for, and attitude toward, lan^ 
guage study have been developed by Wallace r ambert. and his associates 
(Gardner and Lambert, 1972; Lambert et al., 1^68) , an*^ several of the • 
scales and questionnaires used in the Lambert studies have been incor- 
porated in the Foreic^n r^nguagre Attit^ide Quest ionns i r& prepared by L'*^"'^ 
Jakobovits on behalf of the Northf*asV Conferen^-e oVi Foreign Language 
Teaching (Tursi, 1970). Measures of thi*= type could be exp^<^ted to 
serve as indicators of ''perseverance'* in the Carroll model, Mea^^ures 
of "ability to understand instruction'* are available in tests of genernl 
intelligence. \ 'Time available for learning" could be quite easily 
quantified in a programmed instruction context and, notwithstanding 
current .difficulties in accurately measuring ''learning time" in the 
usual classroom and homework situation (Packard, 1972), effective quan- 
tification in these settings is basically a matter of improved observa- 
tional and recording techniques. 

The most elusive variable in the Carroll model is without douh^ 
that of ^'quality of instruction." However, the use of interaction 
analysis procedures for classroom teaching CMoskowitz. 1970;) and more 
precise formulations of effective teacher behaviors as judged by panels 
of experienced teachers (Hayes- et al., 1967) provide encouraging signs 
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th^Xreasonablysat isfactory measures of this component may be available 
withrn^ the not *^oo distant future.^ Mult iple regression techniques and 
^other statistical procedures a¥e available for use with the Carroll model 
soon as the necessary measures have been defined and data obtained for 
representative students and course combinations The considerable praeti 
cal v^lue of a predictive system based on this model would bq^to permit 
a high rv individual ized prescription of the types of courses and lengths 
of studyXhat students having various combinations of language aptitude, 
intelligence, and motivation would require in order to reach defined 
learning goal 



Seleatlon of Criterion Measures 



So far in the discussion, interest has been focused on the predictor 
measures as such, whether a single predictor--as represented by grade 
average, I.Q. score, or mlat scpre--or the^ multiple predictors implied 
by the Carroll school learning model- The magnitude of any predictor- 
criterion correlation is also highly depel^dent on the nature of the cri- 
terion measure itself. It is unfortunately often the* case that sdme 
readily avai lable' measure--such as the final course examination or' a 
standardized test that happens to be on h^nd-^will be adopted as the 
criterion measure for a predictive study, with little cojisideration of 
the extent to which it accurately represents the specific^ achievements 
which the prognostic measDre was originally intended to predict. For 
example, a c^ir^.ain prognostic test that is intrinsically a highly accu- 
rate predictor oS. listening comprehension and speaking ability might show 
only moderate or low correlation with an end-of-courSe examination con 
sisting predominantly of re^^ding compreh^^ns i on question? ;^nd writing 
exercises. 

The proper selection of criterion measures is of special importance 
in large-scale research studies aimed at the experiments*! i dent i f i t i qti 
of promising predictor n<*asures. 5^ince the statistical pr'^cedures u^ed 
in these studies operate '^o maximize the prediction of the criterion witi* 
out regard to its nati»re, it is CT*iicial that the criteri*'>n r^-pr^s^nt the 
most valid measure '^f t >^ e des ir*^^d achievem*^nt available. thi«= re-^r**'' 

furth'^r i^dvanr^^s ^n the area of ^rmenosti^- m asuTement trust rel' . 



least in t'art . on r ro pnnd " rg 
cision of the r r \ ^ - \ 



c n »t e** in the ^ oph is'ir^tior aii"' r 
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the ^ t i on V> f a ch i ev emen t are f '*<yus ed on m<=*Fisnri'*g 

acqu i s i t ton f con rse ron ♦ent -'^hat is to SJ^y, thos e aspe^ ^ 
1 e X i con , and ^^tructiire to wh i ch t ^e t ud en t >^ as he'^n f o r 
in textbooks, classroom scssion<=. or through other instrvjc 

Within the achievement testing are^^, two su>^c 1 a s s i f ^ ca t i 
based on the degree of detail which the t^'^t results are 
intended to reflect. Tests which undertake to determine the student's 
acquisition, or 'lack of acquisition, of discrete elements ^'^f course con- 
tent (for example, mastery of each of the vocabulary items introduced in 
a textbook unit) can be referred to as diag^noi^tf cr*achi evem-r^nt tests. 
Generai achievement tests, on the other hand^ arj? directed at measuring 
the student's ability to combine several different aspects of course 
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content in situatibns which more closely approximate ordinary language 
^m^. Even though the content of a general achievement test may be more 
global and more *'real i^tic'* than that o£ a diagnostic achievement tesrt, 
it continues to share the primary characteristic of all achievement tests 
-in that it is properly based on only those combinations and recombina- 
't'iqiis of language elements that have previously figured in the formal 
instruc^tion- j ^fi^ 

It should be emphasizCTi that the procedures followed in developing • ^ 
an achieveVnent test are uniformly applicable to any type of course or 
course ^equ^nce- Regardless of the theoretical or pragmatic guidelines 
used in the initial specification of course content (fo^^example, con- 
trastive analysis, functional load, situational utility^ or simply the 
informed judgment of practicing teachers),^ the achievement measjurement 
question is always that of adequately represent ing--within the content 
of the test itself^-thfe content of the instructional syllabus on which 
it is based. 

With the possible exception of an achievement test on the first les- 
son of a beginning course, it would be impossible to include in any test 
instrument of administerable length all of the linguistic elements to 
which the student is exposed in the instructional setting. The specifi- 
cation of test content, in virtually every instance, must involve 
sampling, from among an^extremely large number of potentially testable 
elements, those which can be considered to stand in for a wide number 
of similar elements nc^^ formally tested. Unfortunately, the identifica- 
tion of meaningful domains of ''similar" .elements is an . extremely complex 
mattery and the more highly diagno<;tic the test, the mnr*=» eviHpnt are the 
problems involved. 

Some of the^e di f f i f^ul t i es have be*=^n previously cited (Clark, 1972b>, 
using. as an example th^ diagno^^ic testing of '*t>ie written f^rms of the 
French r^s<;$. aompas^ , W'th»n this g^^nera 1 a * ea , the proper achievement 
testing stratp^gy would be to 'dentify various ^^on*"ent domfiir's which coul 
he cons^d^'ed homopen o*'s for testing ptirT'ose'' in th' pf>ns'" tJiat stude»** 
success or fail'ire nn ;i pivf^n i t^'" w>thi'^ the 'lomain co>ticl h^- taketi^n*- 
indicative of s'milar pp-for^'ance on t'*p other 'toms ' h t Homfi ' n . A 

prop''>sef1 ^oma'Ti mi-'lit b*" t>i' H'ffrr^^nt r*f*rs"nal f rms of * sirigle t"'' ' 
fied verb ( s*ijs ^27e i^n ^M^, 7 7 ^^t- aJl^, ef' . "» . ^^tu'^en's 

answering a " es ^77^ ques'ioTi" coTT'ct^y ^^^ou 1 •-^ be e perteH to pe' form 
correctly a ^"j.*? •> 7 7 ^ q^^estion** ot -^n any 'ther romponpTi^ of this 

particular p'^ra-Hgrn qnd 'hosp fai'inrr rh' testeH item wouT' b- e pected 
to miss e 1 r> f r ) » . ^ t i . p y » i m ♦ - f t \\rx - 1 ■ - in'^ ; n ^ w ^ > " ♦ K ^- \ t K t r- ■ t 0*1 
on them. ^ 

Tt is obvious ^hat th«^ way in which th^ domains aro sppcifi^d i of 
crucial importance to the extrapol/i t ion of the te<=ting resnlfs, and that 
the student's ^language learning? history must al*=o >^e taken into ^cc^unt 
in formulating thes-e domains. ^or example, ^he jV> suis ajl^^ th <=»s aiie. ■ 
domain might be appropriate for students who l^ave, in th^ir course work, 
been introduced to ^\\ personal forms of thi*^ verb The same domain 
would not, however, be usable for classes in which only the ''tu" form liad 
been introduced at the fi^t^^ <^f testing. 

For a given course of instruction, it might be feasible^ although 
certainly arduous, to specify a ntimher of testing Homnins based on care 
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ful anal*ysis of the content and sequencing of the teaching mat4^iairs, .^nd ' 
then to include each and' every elem^t of these domains in a lengthy 
validation test. Domains for which student testing results were not uni- 
form across elements would be reformulated and retested; for any domains 
showing homogeneous results, individual elements could subsequently be ^ 
jlrawn- on a statistically random sbas is for inclus ion. in a smaller/ opera- * 
t Ion all test form. - 

Within the^usual classroom s ituatioii,^^he prospects for such detailed 
fest preparation activities would not seem o^jcouraging. However, educa-- 
tional publishers developfng new textbook programs 'and a«comp.anying test 
materials might find this a reasonable procedure. It should also b,e 
noted that such an approach would permit the es^sent iial ly simultaneous 
development of several alternate test fontis, each having, highly comparable 
content and measurement character istics • ^. 

A second fundamental ^difficulty in diagnostic achievement measurement 
is that of designing testing formats and, individual test items which 
accurately and unambigubus ly measure the specific behaViors in question. 
It is unfortunate that mult iple choice procedures, although admirable ' 
from the viewpoints of scoring speed -and objectivity, do not lend them- 
selves well to the diagnostic testing of discrete linguistic accomplish- 
ments. One drawback in-' this regard is the probability of correct response 
by chance* ^ This probability Ls at the highest level for 2-option or 
'*true-f alse" items » for which the student has a 50% chance of answering 
the item correctly in the absence, of any knowledge of -^e linguistic 
element tested. Th^t,,^^^^^,® ^ i^^^o^ suc"?^ssful chance respon«=;e can he 
reduced somewhat by increasing the number of answer options per item Cin 
4- or 5-choice items, the chance success probabi 1 ity . is ,25 and .?0, 
respectively"), but b€*yc^nd a total of 4 or 5 options per item, thf:> itom 
writing task becomes ex'tripmel/ difficult nnd time consuming.*^ 

A second meafiS of reducing the. chance success fRctor i to inc^rpomr 
into the t*=*st more tha*> one item ^-ased on tf^e <^ame 'ont'ent element and 
to require the student to respond ^'^rre^tly to et^ch of these it<^ms h'^for 
mastery of the element i*^ -assumed fhe stati'^tical r^Tobabili^y of a ^^f" 
dent's nnswe^jng ench" of a s<rie^ of multiple clioic** item" by c^a^cp 
be<-omes ery low ^^ith ju^* a few item- (for ^xamr^le, 008 fo» q <^'-qneri 
of tb^ee S choice i^e^^^s). BlntchforH (19^1) l^a<^ m'»de use of »hi<; tor^ 
n i que in <lr v e 1 op » r>g '» H i n gnos r i c 1 I > o r • *=»n ' ed t t of (h i ^ e r ^mm ' ♦ 
However, desfite »he statistical qprppl f^f tlii*; pTocpHnr*^. t}\p t'mp 
requite*! f"r a dm i n » s t r t i oti s app'pci^ihly •nr*f*n'*pd whe»> r tt'otp 

items mn^t ' p pr<^<;ente<J f'^r ''^^cfi f the plement*^ to ^r* ♦este-i. /\ fu**!^*-' 
drnwhac* to this npprosrch i that alp»t <;t'iHe»>t*^ nia\ he ^^l^l'- to -n'p 
certnin formal siw^Mar'ties among Mie vari-ns item's Va 1 i "g wit^> -» *»^.ni 
1 f^men * and ded"c e the appropr lat' answe»<5 <^olely on thi*; ^a<;i<;. 

Tn addition 'o the* problem of • hance re-^^poT^se 'n using TPultiiMo 
choice items for h i gfi 1 y diagnostic p^irpos^s is tKe diffini'ty of ^Jesign 
ing item stems and re*^ponse opti'Mis ^^o as to elinrSnate the po'^sihMity 
that the *^tudent wilj he able to use infoYmati-^n unrel^>«rd t'^ the oio»Mr>,^f 
tested in r.rriving at a correct response. F'or example, 'n h i cFioi'o 
vocabulary item presumably te*^ting a single lexical item (t>ie keyed 
answer), the student might he ahl*^ to rule out the otlier pr'>posed answer" 
as inappropriate without actually understanding the meaT>ing of the 
intended word . ^ 

The possihility that the stud*=*nt may, in mnny ci^^ses. 'e ah^e t^^ finri 
the correct ro<;poji<;e by followiT>;j n lingni^tio pnrh rh'^i> t h <\y\<^ 
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intended, together with, the unavoidable statistical probability of 
correct response on a purely fortuitous basis, render multiple choice 
techniques of dubious validity in testing situations which attempt to 
certify the student /s acquisition of discrete, highly sf)ecific elements 
of course content. More appropriate formats for diagnostic testing pur- 
poses would include the great nxunber of fil^^^in or other "constructed 
response" techniques presented and discussed in Lado (1961), Clark (1972a 
. 1975a), and Valette (1977), 

If if- can be assumed that the student ' s 'response to a diagnostic test 
item" is an accurate indication of hiTs mastery (or lack of it) of the f 
linguistic element in question, a test based on a number of such item« 
(each dealing with a different element) may be thought of not, a3 a single 
instrximent for which, one total score would be generated, but as a series 
of individual, one-item tests^ with a separate ("pass-fai 1") score avail- 
able for each item. Although such a high degree of diagnostic specifi- 
city is.|)ossi^le in theory, the sheer data processing and interpretation 
burden which the one^item, one-scbre principle imposesc^on classroom 
teachers and^ students alike would make its full-scale implementat &n 
un'feasible in most cases. For example, a 10-item diagnostic test admin- 
istered* to 30 students would yield 300 separate "sco:^^esl" Administration 
of 15 such tests over the coujrse of a school year W9uld produce 4,500 
separate items of information on* student performan^ which would have to 
be tallied,' reported, and interpreted. 

An'^fettempt at handling large quantities of diagnostic test data 
through Compjater techniques has been made by Poulter (1969), who used 
mark-sense cards as the student response medium for language laboratory 
quizzes. 'the Center for £:urriculum Development (1971) at one point 
offered a computer-based scoring servic<=* for test*? in its Voiy et in7naf^<; 
French program, in which individual item r'=»spons*^s were stored and 
retrieved for individual <^.tudents anH the clas's as a whole, together wHi. 
printed statements of the linguistic aspect tested in each ^ase. Com 
puter-assisted Jtest admini si-rat ion , -^coring, anH Hiagn'^sti^ reporting 
has ,ai-s^^ been imde' taken Boyle et 'K '197'"'). Wit'' few excepti^^ns, 
however , these anH ^i?''iln> f<-f-ivi\i'|ii«-r: tw^t r^-ifHiv 'MniiniO'- t th^ 

r 1 r>*;sr<^om teaclier . 

An a 1 t en^at 1 ve t'* single it'tr reT'orting uid i n * err* r et ?*Tio'* is tJ»e 
comhiViin^of sQver^^l ♦est Items into hTondf?r units of c'^Vitent ' i ' }\ sti'" 
ret^iiii soTne decree of d5agn>>st^r a 1 ue . For e ^ amp 1 , levels of scor'- 
repoiftinp were provid<^d for in an expe r i m^^n t a 1 te^t o*' "spok<-n Spa»^ish 
grammar*' develop'-d by FHu^ational Te<:t'ng Service for n^e in P^oce Cor?'" 
language testing proipcts f Fdtir t i '^r^a 1 Testinp Service, 1^71). At t^»- 
more detailed level, '^ach ' esp nse »^as scor*^d '"para t e 1 > , p^rmitt'ng 
diagnostic state"'eni-*5 -licli ^ <; , "the '^tuHei»t can r^rodnce t^e third persoi^ 
singular present r«=*»*se form of i *vir ?n a singi? <=en^en'e cont^^vt using 
known vocabulary *' a second, m^^re general 'evel of scoring. i grour- 

of related Items' was ^on^iH^red to c^nstit'^te a ''mini te'^t" for a some 
what broadeV- category (e.g. "preser^t tense vrb forms"), with tbe scor*" 
for eaf^h mi tn- test reported as the number of correctly answered items 
within that category. This second technique r^ermitted identification of 
areas of student strength and weakness at the lev-l of •^pres^nt tense 
verb forms/' "possessive pronouns,*' ''definite and indefinite articles," 
and sc\ forth. Although this second-level scoring procedure diH not yielH 
the great info'nriat i ana 1 detail of i ndi v i dua 1 - i t scoring, '■he ^'ata 

hf\ndl ing aspect's w^re cor^c i rlerabl y les^; onormT^. pQpoc i a 1 1 >' for Q^r*ro 
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information accumulated across students and test administrations. 

Ap. important conceptual and practical -task in the future development 
of 'diagnostically-oriented tests of^ language achievement will be the adop 
tion of m level of specificity which permits a useful degree of instruc- 
tional feedback without exceeding the information processing capabil it-ies 
of the studjsnts or teaching staff- Despite the problems involved in 
defining such a level, and in developing diagnostic achievement tests 
generally, these efforts should be more than repaid by the informational 
feedback which such instrtimerits can provide in the service of increa^d 
student motivati^ (Cartier, 1972 ; Marso, 1969; Pack, 1972; ,^d Steiner, 
1970) and language course planning and improvement (Parent arT^Veidt, 
1971; Valette andt,Disick, 1972). 

Genera i AchlevBinent Tests * 

General achievement testing is by definition a less specific and less 
highly controlled type of evaluation in which diagnostic precision gives 
way to the presentation of longer and more natural language sequences. 
Within this framework, the use of multiple choice formats is not neces- 
•sarily proscribed. Since the measurement focus is on whole-test perform- 
ance, the effects of chance response are diffused over the test as a 
unit, and a cerfain proportion of the total test score is formally or 
implicitly discounted as attributable to chance factors • The possibility 
that students will take somewhat differing linguistic paths to a correcit 
answer is also of lesser concern because of the more global measurement^ 
intent. ^ 

Although diagnostic standards may be relaxed for tests of general 
achievement, these tests must continue to be based on lexicon and struc- 
tures to which the student has been exposed in the course of instriictiOTK 
In this regard, the use of externa 1 1 y prf^pared standardized ins trun^ent 
for achievement testing purposes mu'^t be conditioner^ on the extent to 
which they incorporate the lexicon, struc tnre*=: , and other ^^lem^^nts of 
linguistic content presented in the course of instruction. To the extent 
that test- conten** ^nd instMtcMonal content differ, the u ef 1 1 1 ness of t^*' 
externa' test •'is a measure of spe^^iflc course a<*hievem^nt 1*== dimi »m sh^^^'i 
CarroM (1P69) and Valette flQf^9) 1 tKiep'^nde" t: 1 y raised this point in 
their 'Mscussi-^n of f^sLitig re^^ults f'^r 'he o called 'Tennsx 1 van i a 
study * (Smith, 19^0), in V'^^i^^"' score-^ on the MT.a Cr^np^^r ^t i r'ar^ign ^.^n 
gii^g^ Ar'h i ^rn^nf- l^^ts ( P 'uc?^ t i ona 1 Testing Service, 1P6F>) we^e used i^'^ 
criteri^'^ of a«'conipM ^hment in **aud io 1 i n gugn *' and^ "t rad i ' i ona 1 " '"oiir«^e?. 
Valette fou^d tl^pt^*^ lar??e proportion of* the vocabulary "sed »n the ^ 
Cooperative 'VacHng Test 'lid not appear in the textbooks used by the 
and i o 1 1 ^'giia 1 cla-'ses "»nd on tlie has^s --f this sugg'^ste'l th-^t the Coop- 
erative Tf^c^t i\f^ f n vnlwl Tuoa^^iito f n«^blo\' r^vWf^j'i t f <>t the n ' «<1 i * * 1 i i n 1 

iTie prohJem of content 1 ti using ev t etna 1 1 y -prepared tests a<5 measures 
of course arhievement can be obviated to some extent by careful prior 
examination and selection of the te*=t instrument*^. Cox and Sterrett 
(1970) have suggested a statistical procedure in which only those test 
items which are judged to reflect course content would be included in 
calculating total test scores. This procedure would effectively remove 
the influence of cont ent- i nappropr i at e items, and, providf^d that the 
unscored items constituted only a small proportion of the total items in 
the test, ther^ would be little adverse effect -on test reliability. 
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EVALUATION OF PROFICIENCY 

The purpose of proficiency testing is '*to determine the student's ability 
to use the test language effectively for "real-life purposes," that is ' 
to say, in vocational pursuits, for travel or residence abroad, or for 
such* cultural and enjoyment purposes as reading literary works in the 
original text, attei^ding motion pictures or plays in the test language, 
and so %fth. In all cases, the measurement emphasis is on the extent 
to which the individual is capable of utilizing his knowledge of, and 7 
facility in, tHe language to accomplish some desired receptive Or com- ' 
municative purpose. In contrast to achievement testing, which is 
explicitly based on the nature and content of the student's language 
learning history, proficiency testing focuses entirely on the examinee,' s 
ability to perform pragmatically useful tasks in the language, without 
regard to the manner in which that ability was acquired. 

Within the^ proficiency testing category, it is possible to, distin- 
guish direct and indirect procedures. From a theoretical standpoint, 
the most direcsl: procedure for determining an individual's proficiency 
in a given language would simply be to follow that individual surrepti- 
tiously Over an extended period of time, observing, and judging the 
adequacy of ^performance in the lanfeuage-use areas in question: buying 
train tickets, ordering a meal , .conferring with colleagues on work- 
related matters, conversing with friends on topics of current in-terest, 
writing a note for the plumber, ordering business supplies by correspond- 
ence, and so forth.- It is clearly impossible, or at least highly imprac- 
tical, to administer a ''tesf' of this type in the usual language learning 
situation. Nonetheless, the development of proficiency measurement 
procedures that can properly b^^* considered ''direct" must be based on 
approximating, to the^greatest extent possible within the necess^'-ry con- 
straints of tesi-inp tfme«, and facilities, the cpo-ir;, c t ; r.,, - in w-i, 
the profi^if>ncy rall*>d Mpon in r^^a 1 1 i f «:» 

The formal correspondence between the setting nnd operation of the testinp 
proceHure and the setting and oper-^tio^' of the r^-al life situation 
tntes the face content validity of tlie i''''^t--the basic psy-homet r i c 
touch^^tone for direct pr-^ficiency t^-sf^. Tli^ face/content validity of a 
givf^n instrument must be ''etermined close examination njid analysis of 
th<=^ testing mat,^igtls and procedures themselves, a^^d this determination 
is n*>cessarily logical and judgmental, ratl^er than statistical, in nattir*- 
This contept may he somewhat d i <^ turbi nR to <^ t at i s t i ca 11 y -or i en f ed test 
deyelop*^r<^ and users, who might prefer some numerical inde- of validity, 
perhaps a ''co*-f f i c i ent of face/content validity'^* analogous the predir- 
tive validity coefficients associated with prognostic tests. Flowf-vei , 
since a direct proficiency test is, in effect, its own criterion, it must 
necessarily be evaluated by informed inspection rather th^^n tHrough 

.statistical means.. The >i^gmenta 1 nature of face/content val idi ty 'shoulH 
not in any way be considered a disparagement of this validation process: 
as succinctly expressed by Rulon (1946:i, [face/content vajidity] sounds 
as though^ it were a rather superficial thing; as though we shpuld require 

'some more conclusive proof of the test's validity. Actually, therf- can 
be no more conclusive proof (p. 290)." 

nirect proficiency testing can encompass the measurement of student 
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skill in any of the 4 language modalities. For example, the Graduate 
School Foreigri Language Tests^ provide measures of the student's profi- 
ciency in reading verbatim excerpts from journal articles and other 
tejxts appropriate to given areas of graduate school specialization. 
Direct measures of writing proficiency require the student to produce C 
materials such as .business letters, reports', and other documents at 
issue in specified real-life writing situations*. For purposes of discus- 
sion, it will be useful to concentrate on an area, of language proficiency 
of high current interest to students, teachers, and language testers: 
the ability to communicate orally in fac^-to^face language, situations . 

As has been emphasized by a number of ' authors' (Cooper, 1970; 
:Jak6bovits, 1969, 1970; Paulston, 1974; Spolsky, 1968; and Ufishur, 19723, 
an individual's abj^l^X communicate effectively in a given language 
cannot be considered "directly proportional to his mastery (or lack of 
it) of specified lexical items, grammatical structures, or other discrete 
elements of performance. As a" consequence, instead of using^ linguistic 
inventories as a point of departure for setting test specifications, the 
developer of a communicative proficiency test must be concerned with 
arranging tes1:ing situations that are the closest possible facsimiles of 
real-li*fe communication situations. Instead of evaluating the linguistic 
accuracy per se of the examinee's performance, the tester must concen- 
trate on detennining the extent to which the' examinee is able tp convey 
various types of inf orma'tioh ' in an accurate, efficient, and si tuational ly 
appropriate way. 

With respect to appropriate settings for a direct test of communica- 
tive prof iciency*," the presence of a live interlocutor is probably indis- 
•pensablis for adequate face/content validity. Published speaking tests 
using tape-recorded stimuli to which the student replies are some steps 
removed from a real c<i||jiun^cat ive situation in that "they do not allow • 
for the speaker interaction and instantaneous alteration of content cha^ 
acteristic of face-to-face conversation, F.xcept for telephone convers?' 
tions and other communi cat i r>n-at -a i stance situations, real-life dr^l 
communication al^o involves facial, ge«;tural, anH <s>ther vi<;ual <^ues which 
cannot be provided in a test «^ituation excep^ on a face t<>-fnce basis. 

However, the mere fact of a face-to face c^^nversat i oti i <; riot of it*=*^l 
a sufficient demonstration '^f vali^lity: close '^tte^^ti'n) m'ist also be 
paid to the topical content of t^^e conversation rtnd to * he r'"^ y ^ho I '^g i ra W 
i n t er p«=*TS^na 1 rel at ioTxsh i ps t-hat are es t a^* 1 i s^^ed during the course of t^^ 
test. It is probably futile »o hoy^p that Mie affec* iv*=* c'^mponent <^ of 
formal testing «=ituation will ^ver clos<^ly approach* tho^e of t^e real 
life situatioT><^ which t^e test attemi'ts t^^ reflect. As 'erre" (196~) 
expresse*= itr **...bot}^ pnr t i i pant s know perf'^ctly well that it is a 
test and not a tea-party, «nd >>oth are subject to ps y O\o l og i cj^ i te^^^ i •^^i*'' . 
and what is more important, to linguistic constraint * of style ai^H 
register thought apprr^priate to the occ^jsion >^y both participants " Mono 
theless, for the sake of test vali^Mt^ , e\'ery effort must b^ma^ie to 
minimize the "e*xam inat i on" aspect*^ of the ^'un or <^ ^ ot^ in ftW^^^f <->f n m/^re 
natural and encouraging amhiaT^ce. 

Scorimf Proaef^ure^f; t 

In addition to the validity of the test setting and administration pro- 
cedure, there is also the question of validitN' of the scoring procedures 
used. .The degree of scoring validity depend*? on tl^^^ extent to whicli the 
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scoring system i:epresents examiner judgments of the student ' s ability to ^ 
convey information in an efficient and situational ly appropriate way, 
rather than the grammatical accuracy^ correctness of vocabulary, dr other, 
linguistically-oriented aspects of* the student's performance- ItPisX 
however, .often difficulty to separate "communication" ani "1 inguistic accu-- 
racy" in scoring practice. For example, the scoring criteria for "level 
2" performance on the Foreign Service Institute's language proficiency 
interview (Rice, 1959) are reproduced below, with emphasis (italicizing) 
added to indicate tho'se portions involving judgments of linguistic accu- 
racy rather than of communicative performance as such: 

Can handle with confidence but not with facility most social 
situations including introductions and c|littal conversations 
about current events, as well as work, fa5nll|r> and autobiograph- 
ical inf ormation^* can handle limited work requirement s , needing 
help in handring an:^^ cpmpl ications or difficulties; can get the 
gist of most conversations on non-technical subjects (i,e,, 
topics whicH require no specialized knowledge) and has a speak- 
ing vocabulary sufficient to express himself simply with some 
circumlocutions; accent, though often gutite faulty, is intelli- . 
' gible; can usually handle elementary constructions quite accu- . 
rately but does not have thorough or conf)ident control of the 
gramma r . 

A sirrfilar intermingling of communicative and: linguistic criteria is 
seen in the description of "elementary speaking \profici ency" on the^ Eng- ^ 
lish Proficiency Chart of the National Association for Foreign Student 
Affairs (again, emphasis added) :^ 

Asks and answers questions on daily T'erson^il needs and familiar 
topics with T>*ery limited voc^huh^ru: mf>l^f=><z fr^qur-i^f- ^^criV- ^77^7^ 
in r\.}ctnr(=^ ^nd pr ryjinnr- i ^ t i on , 

In addition to the validity question ^*or scoring procedures is the 
problem of ^coring reliability. Although scoring reliability is n^-'t a 
S'igni i cant pmhlem in testing pr6<-ediires haseH on multiple choice or 
short respon<^e for"^?*tc^ it fm^Mmes si;hs t -^nt i a 1 proprtions in -itiiations 
where humR»' judges "^^^"t n^^sign nnmoricpil 'atings- to logger and less hiphlv 
strMcturod sanijil-s rf language heh^j^ 'or. \\Mn tyt'e^ of ^corihp reli-ihil 
it\ fiTo at i^'ue iT> <^\\rh instances: iT)'rfl rnfer r e 1 i ah i 1 i f \ , \^'h i rh r^-fer^ 
to tl^e cxt^'Tit f win rl» n g'vet^ r'?itoT i ^ ■ p\h\r^^ i^^z^pp t i t i v < 1 y , to n<5<5ie- the 
same ^^or^ to n givpn f^t pe ' f "rm-inr e , niul in^er rnt^i r H ?ih ^ 1 i r \ , wl^i ri' 
refer*' to the exr^^r to which P or »'vorr t a t e r <^ Ms^*gM the *^?^'"e score t • 
given peTformance. Tow int^'a rnt<=»r r^-linhilitv n <=eri'Mi<= T'Tob'lem, 

sinc<^ it indicates th-^t the sfj^ndards of scorittg jnden^ent njo not -'t^^^^l^ 
even amor^g i nd i v i dng 1 judg'-s. Low inter rater relin^'lity Is also a 
trouhio^ome matter in that the ohtnined sc^re hecome^ d e^p-i'^nd en f in large 
part on the rater involved: examinees who happen to draw a more leni^^nt 
rater stand to henefit by compari«;on with oxnminee*? ivhn^:*:* perf <'>rmaTir o i «^ 
evaluated hy a more severe rater. 

^Studies of i^ntra- and intcr^-rater scoring performance have primarily 
been conducted in the area of written production or '-'essay testing*' in the 
exaininee's native language, as comprehensively reviewed hy c:offman (1971 )i 
Tests of oral communication in either native or second languagc^i pose 
technical problems in that it is substantially more difficult to ''re- 
present" the student's performance for repetitive scoring by the original 
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rater or other independent' raters (Clark, '1975b) ; this situation has 
doubtless contributed to the general paucity of scoring reliability stud- 
ies in the speaking prof iciency area . As reported in the test manual, 
an inter-.rater reliability study of 100 speaking test tapes in the MLA- 
Cooperative Test battery yielded a 2-rater correlation of .59, using a 
scoring procedure based primarily on judgments of linguistic accul'acy 
rather ^than <Sn overall communicative^ abi 1 ity . In a small-scale reliabil- 

* ity study of the FSI interview carried out at Educational Testing Service, 
the scores of 2 raters simultaneously pi;esent at 80 interview sessions 
coincided as to basic score leyel (on a 6-point scale.) in approximately 
95% of the ca^es, Notwi^^thstanding the^ infQrmafion provided by occasional 
limited studies of .this type„ the scoring reliabilities of direct profi- 
ciency interviews (and of tests of' speaking ability in general) remain 

• to be comprehensively investigated and document ed, / 

Several procedures might be sugges^ted to increase the scoring relia- - 
bility. of diretct tests of communicat iv,e proficien^. For example,, the 
interviewers could be asked to cover specified topical areas or to ask 
a fixed series of questions of each examinee. . Or various scoring aids 
could also be devised, such as th^ /'Factors in Speaking Pro^ciency'' 
chart developed by FSI staff for use aJ.ong with the original verbal 
ratings of competence (Wilds, 1975). This chart breaks down the student's 
performance into the categories of pronunciation, grammar, vocabulary, 
fluency, and comprehension. Weighted scores are assigned to each cate- 
gory, and the student's final rating is derived from a total score across^ 
categories which is then reconvei*ted to the original verbal scale. 

Although a high degree of scoring reliability for direct proficiency 
tests is certainly an important goal, such reliability should not be 
sought at the expense of face/content validity. In the first example 
above^ detailed Advance structuring of the consent and sequencing of the 
interview would make it .less representative of fhe oft<=»n dlgr^s?;ive con- 
versations typical of real-Mfe commuTi icat i on . In t>^e second '^xample, th 
compartmen t a 1 i t ion of frhe stiiH«^nt's response'^ into different lingui^^^i" 
catego*ries for scoring purposes, t'^gethet witli t!ie A'=:signm<='nt of fixed 
score vs/eightings to ea«^h category, conl^l expected to ^^o ^^ome viol*='»»''^ 
to tho final ratinr a Hire'^t mea*^ii*o of commin* i cat i v e profjcien'^y. 

Although it may be h'ped ^hat te'ti'^g pror odiire*: can even' u9 11^ ^^e ''^^ - ' 
oped which combine a h i pIi ciep'^^e of fa:e/r ntont validity wiMi high 
scoring reliability, i r -■•se*^ '> f 1 ' c'- fo' i r'e? ? ^n*^ f r?' i ' f \ 

}l r>( 1 1 '1 t I' p T ^1 n ' o '> 'Is* ' • ■ ' t ! , . I , ( ; f \ ' t-r 

Tndir'^ct proficiency test" a^e al«;o ir^tended to as«5"«5«= xhf^ pxteiw to 
wh i <^h the stndptit is ahlo to fiinctif>n appropriately in real life lanpnnpo 
use situations. However, ntilike dir'^ct proficiency te<;ts, indirect 
measures' are not required to reflect authentic 1 angu?ige use context^^ and, 
indeed, they may in many cases bear little formal resemblance to linjriiis 
tic S>ituations that the student would encounter in real life. ^One 
example of an indirect proficiency measure is the "reduced reduttdancy" 
test developed by Bernard 5;polsky (Gnies et al., 1977; Spolsky, I9"72; and 
Spolsky et al.\ 1968). In th'i s procedure, the student is asked to listen 
to and transcribe, a series^ of sentences in the test language which are 
accompanied by a specified degree of electronically produced background 
noise. Development of this test is based on the theory that individuals 
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who have a high level of global proficiency in the language will 
•better pjjle to utilize the reduced number of lingliistic cues available 
iJi acoustically distorted speech and will' thus hf able to perceive the 
recorded stimuli^accurately at lower signal/noi^e levels . , Although a 
case might Ue ihad^for the existence of a limited number of "reduced 
redundancy" situations in real-life c^texts (for example, telephone 
reception over faulty equipthent) , the Spolsky technique does not, in 
general, reflect the kinds of language-uSe situations in which the stu- 
, dfent would be expected to operate in real lif'e. 

The usefulness ^of the Spolsky test and other indirect proficiency 
measures does not , ^however, depend on the tests' f ace/cotitent validity 
but on the extent to which the test, scores are found to correlate, on a 
statistical basis, with more direct measures of the proficiency in ques- 
tion, simultaneously administered during the test validation phase. The 
practical utility of these concurrent validity studies and the obtained 
correlations lies in the extent, to which it thereby becomes possible to 
predict, on the basis of an exanlinee's indirect test score alone, the 
score that he or she would be expected to obtain on the more direct 
measure when the latter cannot be ^^inistered for reasons of cost or 
complexity pf administration. 

An indirect testing technique which has received considerable recent 
attention is the so-called "cloze" procedure, in which the examinee 
attempts to replace words previousw deleted from a continuous text. 
Originated by L. Tayloir (1953) In connection with native language 
learning, the cloze procedure was intensively examined by Carroll et al. 
♦ (1959) as a possible measure of foreign language proficiency within the 

College Entrance Examination Board test ing program . In the Carroll 
• study, only moderate reliabilities were found for French and Herman cloze 
tests based on an every lOth-word deletion pattern, and their opei:^tional 
use in the College Board program was not recommended. More recf^ntiy', a 
number of additional investigations have been carried out using various 
-adaptations of rhe original cloze rrocednre. Darnell (1P68) developed 
a 'Vlozentropy'* procedure in which the te^t responses; of native speakers 
were used to define nnd weight a^'cept-^hle ar^swer s according to an infor 
mativQ^n th'^^orv model. A 200- i' em te«=t u^ing the clo7.ent*op\ ter^hniq'ie 
was fouT^d tn correlate to the extent -^f .^4 with ^cores on the Tf'.c?/ 
Fnglx,c;h ^.s fnr^ig^^ T^nqu^q^ (^OFpT.: for a gr'^iip of 48 non riative 
speakers of rngMsh s^iHyinp at t he "n i ' er • r y o^ Colorado. ^ major 
d i «^ adv r>n t fige of t}ie Oa^nell appT'^f^'"H ^ mo^^I f^r cMmpnt nc-irrf^fwo 

in the complex scorinpr procediir'' . 

Tn other '=tiidies. OUer and Tnal (197]) ''dTn i o'i s t pred Fnglish clo7<^ 

te*^t iT\ whi'-h on 1 \ prepo*^ i t i ons were deleted and obtained a correlafion 
of .7S with total srores on th*- UCLA Fnglish placement examination, con 
sisting of multiple choice and free r^spot^se quest i'^tx^ cove^Ving vocnbu 
lary. grammar, reading compr ehen*^ i on , and dictation. Oiler fl972h) fou'^d 
that a scoring system which gave credit for J^ny conte^tually acceptable 
word (not necess^irily the original deletion) resulted in higher test 
reliability than the exact -word-replacement method, as well as a ..higher 
(.8."^ vs. .75) correlation with the UCLA placement^ exam inat ion ^Jju^Deri or- 
ity of the ''any-acceptab 1 e-woi^d'' scoring procedure was not, however, 
corroborated jin a later study by Hanzeli (1977). Recent experimentation 
has also been conducted using the cloze procedure in a multiple choice 
format fJonz, 197^; Griffin et al., 1978).^^ 

The correlational results so far obtained with indirect proficiency 
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measures, especially the cloze procedure, allow for sorSfe optimism that 
techniques can be developed, to estimate, in an efficient and economical 
manner, the student's acquisiti&n of/ '*real-lif e" prof iciencie's that are 
directly measurable only through more elaborate and more expen^y^ tech- 
niques. However, some cautionary observations should also be made, as 
follows: 

•With a few exceptions (Pxke, 1973; Hinofotis, 1976), experimental 
studies involving indirect procedures have used as comparison measures 
multiple choice tests or other inastiximents that do not in themselves 
have a high degree of face/content validity as direct measures of the 
language proficiencies in question- As has been frequently urged in the 
testing literature (Carroll et al., 1959; Clark, 1972b, 1975b; Lado, 
1960; and Spolsky, 1968), it would be very desirable to c^rry out detailed 
studies in which direct language interviews and other highly face--valid 
techniques would serve as the criteria against which ;perception-in-noise 
tests, cloaie tests of various types,- and other expex^lmental measures 
could b6 correlated and operationally compared. In .addition to permitting 
close exainination and comparison of proposed indirect measures,' such 
investigations ^Would focus attention on, and quite probably bring improve- 
irients to, the direct measurement technique^ themselves. 

•The magnitude of the correlation between indirect and direct profi- 
ciency measures may be affected by the speci/ic language learning history 
of the individuals tested* Although' high correlations between a printed ^ 
cloze test and a direct measure of oral proficiency might be obtained for 
examinee groups whose language experien<?e has included routine contact 
with both spoken and written materials, the same relationship might not 
be shown for examinee groups having other learning backgrounds. In this 
regard, the oral proficiency level of students whose language training 
has been largely restricted to either the spoken mode (as in some Peace 
Corps training situations) or the written mode (As, in reading-knowledge- 
only courses) might be under- or over-predicted, respect ive,ly , using 
correlational results f^btaitied from student groups with more heterogeneous 
fanguage backgrounds. Additional investigation of the influence of 
divferse lenruing histories on indirect test performance would appear 
indicated, as well as more global «^tudips of the in t err e 1 ^ ion q>» i p'^ amonp 
language moda 1 i t i os. a s they affect t^st per fo Wane e gener^ly. 

•The "repre^ei^t a t i on^ 1 value" of indirP'-t proficiency test?^ i niM<-l> 
less th?in ^h?^t of direct proficiently tests. Wber'^a^^ the nmount 'an^l 
quality of langunge 'Accomplishment represented a given le^vel of per 
formance on a direct proficiency test is readily apparen^ to the '='xamin«e 
and oth^r iriterest**d persons, the s^me cannot be said*of in'Hrect te*^t 
scores. iVhen used in the classroom ^nd- other instructional situations, 
indirect proficiency fasting should properly he acc^'mp-in i e-l by '^^xpl ana t'* » \ 
materials wh^ch permit an appropriate ext r'^po 1 at i on of M\p ir^nt rr-^nitc; 

to SpeCTf7^*^i t'V'p*=^^ aT>r1 jovel'^ of Y^^nl I iff poi(^rxTfTinr>r-r> 

SUMMARY 

The discussions in the preceding sections hav.e dealt' with the major 
psychometric cons id'erat ions at issue in 3 types of language testing 
activities. Prognosti"c measurement involves the use of test instruments 
or other available measures to determine^ through correlational tech- 
niques, the level of language accomplishment that specified students* 
would be expected to attain if they were to follow particular mstruc- 
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tional pifograms.. The MLAT and similar language aptitude tests can con- 
tribute to the ej^f active 'Measurement of at least one component of language 
learning success, but a poten1?ially more effective prognostic technique 
involves a ''systems approach" based on the Ca^rroll model *fef school, team-* 
ing, in which several student-related and in^triict ion-related Variables 
are considered simultaneously in estimating.^i^f or a given student, the, 
probable outcoine of a number of alternative learning programs and strate- 
gies. Improvement: in the quality of the Criterion* testis used to define 
language learning ^uqcess is also considered of great impt)rtance in the 
contint»ing development of prognostic techniques. 

, In the area of achievement tesl:ing--def ined as the measurement of 
the student's acquisition of course content--a major psychometric concern 
is tJhat of •appropriately sampling that content within the 'confines of an 
' ^ administratively-feasible test. A suggested empirical' approach to the 

content question*is the establislimeBt of operationally homogeneous c;>nt:^TTr 
domains froin which indiyidual ^elements can be drawn for testing, Diag- 
nostically-ofiented acihievement tests attempt certify ttie student's ^ 
^ . acquisition- (or lack q,f' it) of discrete, minuteLy specified content 

elements. Nlultiple choice techrtiques are no.t . considered wellrsuited to 
diagnostic testing activities because of statistical and logical factors 
which render single-item data ambiguous as to the , student jj(g> "mastery of 
the poiilC ostensibly tested. Completion exercises anfl oth^^. techniques 
requiring actual . language pjrodiiction are considered more appropriate for 
highly diagnostic testing^ even though they lack the scoring speed and 
. convenience of the multiple choice format. Practical difficulties in 
^handling the large amounts of data generated by diagnostic testing at its 
most lT;ighly specific leve^ m^y make it necessary to combine tested ele- 
ments into somewhat larger units for scor ing- and . interp±etat ion purposes. 

General achievement testing is a more global t<^pe Qf measurement in 
which larger and more natural^ linguistic units can legitimately be pre- 
sented or elicited. The basic requirement in achievement test i ng- ~th\ft 
the test content be derived f^xcliis iv^ly from course c^ti't ent - - |k .appl i rabl « 
to general achievement tests as well ns to diagnostic tests: ^^^^this 
respect^ yie appropr i at enes <^ of u s i t^g f-xternn l i y -.prepare'^ Ins^^^nts fo* 
general achi*^vement testing <J^ends in lar^e part on t'»e pxte'^^- to whi''- 
the exf^rna 1 instruments ndo'^uaSte 1 y mi r tot th*^ rfynff-i-.f qC fjie l '>n gi i n 
(; program in whicb they a*e po he vjsed. ^ v 

Prof ici 4=»ncy testing i nvf> 1 v e-^ Vea sur i ng th*^ stiulent's abiVJ^ty to 
i:»tilizQ the tested language- for pragmatically n^eful pnrp'^*^pi within a 
real ^^^Ve context. nir-ct proficiency te«^ting. Hi-cn^seH prim3»ily in 
terms-of the testing of fac*^- to face '^ommnn i cat i ve proficiency, ref>resen»" 
an a'ttempt to 'duplicat^^ the real 1 i f Innpuage use sitnati'^n as closfelv 
as possible withiT^ the test setting, ns a d cmon s ^ r n t i on ^^of li > gh fa^e/ 
content validity. Scoring procedures f'>r d^rrct p^ of i i,--n'c\ te^ts must 
demonstrate real communicative f^Titoria arwl n high .sf ^oth-^j^trM 

and inter-rater reliability. 

Indirect proficiency tests have the same measurement purpose as 
direct proficiency tests buf derive validity as proficiency measures 
through a correlational relationship with direct proficiency tests, 
rather than through ^he face/content validity df the instruments .them- 
selves, 'IRedwed Eedundancy" tests and various t'ypes of cloze procedures 
*are^ examples of indirect measures which show considerable, promise as 
indices of language proficiency in situations where more direct tests 
cannot be adm i n i s t||^ed . However, d i rect' prof i c i ency tests will continue' 
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to be needed, both for administration in their own right wherever possible ^ 

and as criteria against whi^h the adequacy and accuracy of more indirect 
procedures can be established- 

FCX3TN0TES - • ^ 

^Negative correlatioftfe ranging from zero to -1,0 are albo possible; these 
usually occur when the scoring scale for one of the correlated measures 
is reversed so that better performance is represented by lower scores. 
Negative* correlations have just as much predictive value as the correspond- 
ing positive values,. 

^subsequent to the publication of the MJLAT, a Language Apt^ltude Bat- 
tery, based on generally similar principles, hdd been made available 
(Pimsleur, 1966) . 

^For useful discussions of alternative procedures in defining course 
content, see George (1962) and Perren (1971). 

^For a fairly technical but highly useful further dis cuss ion of 
domain-based achievement testing, see Shoemaker (1975). 

^Various **correct ion-f or-guess ing'' procedures have been developed in ^ 
an attempt to. minimize the statistical effect of random guessing on 
multiple choice test scores. However, these involve adjustments of the 
student's total te&t score and in no way counteract the possibility of 
correct answering by chance at the individual - it em level. Test instruc- 
tions which caution the student not to guess and warn of a penalty for 
wrong answers may be taken at face value by some students but disregarded 
by others more will^ing to risk an incorrect response. For a review of 
the literature in these areas, see Diamond and Evans (1,973), 

^For additional discussion of this point, see Clark (I965J . 

^''Itern'* is used here in the general sense of **te5t question'^- -including 
short answer, completion, and other question types--^and not multiple 
choice questions per se, which are seen to have drawbacks for diagnostic 
testing. 

^Published as an ongoing series by Educational Testing Service, 
Princeton, New Jersey. 

^See National Association for Foreign Student Affairs (1971). 
^^An extensive bibliography on cloze testing in English-second language 
applications is provided by Oiler (1975b). 
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Joshua A. Fishmbn & Robert L. Cooper 



'is the purpose of this article to illustrate the usefulness of a socio- 
linguistic approach to the construction of language assessment - procedures . 
This approach insists upon the specification of the conununicative contexts 
in which the behavior to be assessed occurs, and can be justified on two 
grounds. First, language behavior and behavior toward language vary as a 
function of communicative context . Thus, global, uncontextualized meas- 
ures of language proficiency, language usage, and' language attitude may' 
mask important • systematic differences. Second, language assessment pro- 
cedures have been successful ly-^ contextual ized so as to gather data 
reflecting systematic sociol inguistic variation* This article presents 
evidence to Support both justifications for the sociol inguistic contextu- 
al Izat ion of language assessment procedures. 

SOCIOLINGUISTIC VARIATION 

Soc\olinguist ics describes a loosely-associated set of inquiries which 
have as a common concern the relationships between linguistic and other 
social variables. (For a collection of es%ays reviewing the field, see 
Fishman, 1971.) While investigators who attempt tOL^describe and explain 
these interrelationships differ in their orientations, all agree on at 
least one point: there are no single-style speakers and no single-style 
speech cbmmunities. That is to say, no one speaks in the same way all of 
the time, and no community is composed of speakers who all have identical 
verbal resources at their command. Thus, a person speaks differently 
when shouting at an umpire at Yankee Stadi^im than when delivering a fdnnal 
lecture. Similarly, not all the people who "speak the same language" 
have the same opportunities for using it. Whereas most New Yorkers may 
have the opportunity to go to Yankee Stadium and to learn how to display 
their grievances at an umpire's call, fewer New Yorkers have the oppor- 
tunity (and far fewer the desire) to learn how to deliver formal lectures. 

Several examples can be cited to support this notion. Labov (1966), 
for example, demonstrated that in the speech of New Yorkers the presence 
of final 9r preconsonantal /r/ in words such as car and park is system- 
*atically related to the social class of the speaker and to the careful- 
ness of his speech. Thus, New Yorkers belonging to the upper end of the 
socioeconomic continuum produce this sounu more often than do New Yorkers 
from the lower end of the continuum, and all New Yorkers pronounce it 
more frequently when they speak carefuj^y than when they speak casually 
and spontaneously . 

Similarly, Fischer (1958) found that variation in the pronunciation 
of the present participle -ing by children of a New England village was 
sytemat ical ly related to cb'ntextual and personal variables. The variant*^ 
-iii' (as, in huntin* and fishln'^ was more likely to be produced in 
relaxed than in formal situations, i.e. in verbs like hunting and fishing 
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than in verbs like reading and writing, by boys more often than by girls, 
and by "typical" boys more often than by a "model" boy* He found a 
slight tendency for children from less economically advantaged families 
to use -in' more often than children from more favorable economic circum- 
stances > but the community was relatively homogeneous with , respect to 
social class- 

Variation in the use of American terras of address has been shown to 
be systematically related to differential power relations and to degree 
of intimacy between speakers (Brown and Ford, 1961). Thus, for example, 
two Americans are more likely to use each other's f irst rfames- when talk- 
ing to each other if they have a close relationship and^o use mutual 
title plus last name (e.g. Mr. Smith} if they do not. Non-reciprocal 
use of the first name is more likely to be found in situations of unequal 
power relations (e.g. employer-gmployee) , with the more powerful addres- 
sing the less powerful by the latter's first name and receiving title 
plus last name from him. Also, the more powerful is typically the one 
who initiates a change from non - reciprocal to reciprocal use of the first 
name . 

Another example of variation in Ajnerican English can be seen in baby 
talk (speech addressed to infants), which is marked by a small set of 
lexical items, many of which involve reduplication (e.g. ahoa-c:hoo, bow- 
wow} and special intonational contours and syntactic features (Ferguson, 
1964) • Americans consider baby talk appropriate for use with babies, 
pets, and lovers, but many, particularly men, feel embarrassed at using 
it in public. That Americans talk to infants (and are often tireless in 
their attempts to elicit speech from them) is itself culturally determined. 
Other groups, the Luo of Kenya, for example, beiijieve it inappropriate to 
try to elicit speech froro^infants (Blount, 1972)*,' 

The rich collection of speech events in wliich inner-city Ajuericuji 
Black adolescent boys are skilled (Labov et al . , 1968) presents another 
example of sociol inguist ic variation. These events include ritual 
insults (playing the dozens), the recitation of epic poems (joRes or 
toasts), and the formal display of occult (heavy) knowledge (riftingj . 
Each of these is stylistically marked. Rifting, for example, requires a 
high-flown rhetorical style and employs many learned, Latinate words, and 
its syntax is dloser to that of Standard English than is the syntax of 
other speech events . 

All of the above examples illustrate the iiociol inguist ic universal 
originally asserted- - that there are no single-style speakers or single 
style speech communities. Social and contextual variables are reflected 
by differences in phonology, morphophonemics, lexicon, and syntax. They 
are also reflected by differences in what is said. Membership in a spee^^U 
community is marked not only by. a shared language or language variety, but 
also by shared rules for its use. Thus, members of a speech community 
share not only linguistic competence, the ability tb understand and pro- 
duce the theoretically infinite set of sentencfej^ comprising the language, 
but also xommunicative competence, the knowledge!' of when to speak and 
when to remain silent, and wha\ to say to whom and when (Hymes, 1972). 
Thus, for example, one of the first things American students of Hindi or 
Marathi want to learn to say in those languages is please and thank you 
because of the importance of t'hese terms in American English (Apte, 
1974) . They are among the first terms which American parents try to 
teach their children, and they are the terms used constantly by adults. 
As Apte has shown, however, the use of gratitude expressions is culturally 
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determined. If an American wants to :>peak Hindi or Marathi appropriately, 
he must 1 earn that in the Hindi and Marathi speech communities there are 
some communicative contexts in which expressions of gratitude are obliga- 
tory, there Are others in which they are optional, and th^re are still 
others in which they are taboo. He must learn when to express gratitude 
and, equally important, when not to express it. And he must learn this 
as part of learning these languages* rules of spejaking, which will enable 
him to communicate appropriately as well as grammatically. 

It is not difficult to demonstrate the systematic variation that 
exists in the same speaker *s language usage or the systematic differences 
that exist between the. language usages of different groups of speakers. 
If the obvious has been belabored, it has been because such variation is 
typic^ally ignored in the corj^truct ion of language assessment devices* 
Most writers of such devices appear to view language as a monolithic 
entity and to have adopted the simplifying assumption of tin ideal, speaker • 
hearer who speaks the same way all of the time and docs so within a lin- 
guistically homogeneous community. Such an assumption 1*? justified when 
the language proficiency, language usage, or language attitude to be 
assessed is contextually invariant Certainly, there are invaxiant bchav 
iors which ^re worth assessing. For example, we may want to predict the 
spefrd and accuracy with which a person can translate articles in psycho^ 
logical journals from his mother tongue into a given target language- Or 
we may want to predict the degree to which a university student will be 
able to comprehend lectures in his field when the lectures are delivered 
in a given language. Yet even these examples are not illustrative of 
completely invariant behaviors. Articles in particular psychological 
journals or on given psychological topics may require somewhat different 
skills than other articles, and lectures given by parti-culair instructors 
or on particular topics may reqviire somewhat different skills than other 
lectures. But it the contextual variability of the behavior we wish to 
assess is relatively small, we may be justified iniaaking the simplifying 
assumption of invariance. 

Whether or not the assuun^ti^^M vjt wonteAtudi invuricim^o is justitied 
in a particulai ca^e, the assumption is typically an unexamined one. It 
is the point of this arti*^le that language assessment procedure:^, can be 
irn^roved if the assumption ot contextual iuvariance (or variance) is 
made explicit. This can be done by si>ecifying the i^ommunicat ive cOnieAt;? 
for which the li^nguage behavior or behavior toward language is being 
assessed. If there is only one context for which the asses:>ment is jieccs 
sary or it the asse:>sment foi subs tajit ia 1 ly similar contexts, the pro- 
cedure can be designed with that context or set of context^ in mind. If 
there are several contexts for which assessment is necessary and if these 
contexts have substantially different communicative requirements, trie 
procedure can be designed to reflect this soc iol i nguis t i c variation. 

EXAMPLES Oh CONTEXTUAL I ZED LANGUACj^-A^bSSMhN r MhASUKhS 

Although most language as^c=»sment devices appear to have beeji wiitten on 
the. implicit assumption that the behavior to be assessed is monolithic 
and contextually invariant, a beginning has been made in constructing 
contextua 1 ized assessjnent devices. Examples follow for the measurement 
of language proficiency, language usage, and language attitude.. 
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L^nguagG Proficlenay 

Two language proficiency devices developed in connection with the descrip 
tion of bilingualism among Puerto Ricans in New York City (Fishman et al . 
1971) ij^lustrate the usefulness of a contextua 1 i zed approach to the 
measureihent of language proficiency. One procedure was a word-naming 
ta%k administered in ^glish and in Spanish. Respondents were asked to 
.name, in one minute, as many different words referring to a specified 
context as they could. This was done in each language for each of five 
contextual domains: family, neighboxhood , religion, education, and work. 
For the domain of family, respondents were asked to ncime as many words 
as they could that named things which could be seen or found in a kitchen 
for neighborhood, things seen or found in a neighborhood; for religion, 
things seen or found in a church; for education, subjects taught in 
school; and for work, the names of jfobs, occupations, or prof ession:> . 
Responses were elicited for all five domains in one language, followed \jy 
all five domains in the other. The v.language in which responses were 
first elicited was randomly chosen for each respondent. The order ui: 
domain, however, was kept constant, i.e. fajuily, neighborhood, religiv^n, 
education, and work. Directions were of the pattern, ''Tell me as many 
lish (Spanish) words as you can that name things you can see or find in 
a ki tchen--your kitchen or any other kitclien. Words like salt (.sal), 
spoon (cuchara) , rice {arraz) The task was individually administered 
to 38 a4ults . 

When the average number ot tipanish words produced wa^ compared to 
the average number ot English words produced, when suwtmed across all tivo 
domains, no difference was found. On this basis the respondents could 
be called ''balanced'* bilinguals since their total, global performance 
was the same in each language. However, ditferences were observed heiw^w 
the. average English and Spanish scores obtained for several domain^*. For 
example^ more Spanish than English words were named fox' the contexts 
"family and religion. To describe the performance of the group as a wholes 
would be misleading, however, for significant subgroup difference^* were 
observed when the respondents were, divided by age and length of re^iden^.^ 
on the mainland of the United States. Thpse subgroups, like the group 
as a whole, appeared "balanced" in terms of their total English and 
%Spanish scores. However, differences were observed between the subgioup^a 
in the pattern of their relative language proficiency as exhibited by 
domain. For example, school -aged respondents who had received their 
form^ education via the medium of English showed a &J|tariif icant ly higher 
educaftLon ' score in English than in Spanish, whereas t^ school-aged ' 
resporndents who had received their education via both languages showed no 
significant difference between their average language scores for that 
domain. Thus, the word-naming task revealed important proficiency di>£- 
ferences associated with different interactional domains, and these dif 
ferences could be explained in terms of the respondents' differential 
use of English and Spanish in their everyday l^fe. These proficiency 
differences, however, would have been completely hidden if only a global, 
undifferentiated measure had been used. 

It might be objected that the word-naming task i^ a relatively 
indirect measure of proficiency (although correlations between Spani:>h' 
English word-naming difference scores for particular domains, on the one 
hand, and more direct proficiency measur^ on the other were typically 
about *S0, a respectable relationship considering the usual relative 
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unreliability of difference scores and the -brevity of the procedure which 
elicited them). A more direct language proficiency test--a measure of 
listening comprehension^in- English and in Spanish--was employed with the 
same adults who participated in the word-naming task. It differed from 
conventional listening comprehension tests in that it was designed to 
assess comprehension in terms of specific social contexts. 

The listening comprehension test's stimulus material cojisibLed of 
five tape-recorded conv^sations between Spanish--Eng lish bilingual^ living 
in New York. The participants in all, but one of the conversations were 
Puerto Rican college students who spoke fluent , native English and who 
were adept at switching between languages. In one conversation, ojie of 
the speakers was a parish priest, who played himself in that role, and 
whose Spanish wa:? fluent but not native. 

Each conversation was obtained in the fullQwijig manjicr f^itsL, the 
"actors" agreed upon a social situation in which switcliing between Lngli:>h 
and Spanish would be appropriate aniong Puerto Rii^ans in New York. Second, 
they mapped out a story- line which determined the general direction of 
the conversation m that situation (i.e. who would say what to whom), but 
no scripts were prepared. The actor^ then assigned the roles lo one 
another and "role played," or ad-libbed, the scene, using Englisii wh«ii 
they felt Ejiglish was appropriate and Spanisti when they felt Spanish w^i^* 
appropriate. Finally, they played back the conversation to themselves 
to determine whether or not it sounded natural. If part:^ ot the convei:>a 
tion struck ihem as unnatural, those portions were re-recordcd and at a 
later time spliced into the tape. Each completed conversation lasled 
between^wo and three minutes. Transcripts of the conversations can be 
found in Hii»hmAn et al . (1971, pp. b75-694) . 

Each of the five conversat i onk was intended to repie^eiit a Jitreieju 
^yP^^'of social i^ontext . Consequently, the relationships between speakeria 
(e.g. mother daughter, pr i es t - par i shioner ) . the locales or settiiig-s (e.g. 
home, rectory), the topics of conversation (e.g tfie anjuial Puerto Kican 
parade, the health of an uncle), and the purpose of the interaction (e.g 
offering an invitation, dictating a letter) all varied rrom conversation 
to conuei sat ion . 

After a conversation had Loon p 1 a>MJvl tulv^e to Lho JL oapojidcn t , he wa:3 

asked a series of questions designed to tissess hi^ coi^prehens i on of the 
passage. In addition to questions asked in oraer to test comprehend) ion 
of the English and Spanii>h portion=> of each convex sa L ion , other questions 
were asked in ordei to assess the rt.;sponden i ' s interpretation ot various 
aspects of the social i»itnation represented by the conversation as a w)lole 
For example, respondents were asked to identify the role-relationsl^ijp^^ 
between speakers (e.g. boss - secretary ) , the degree of social distance^or 
intimacy between speakers, the motivation underlying certain remark^ made 
by the speakers, and, for some conversations, the educational and occtipa- 
tional status of the speakers. 

For each conversation, the percentage of items assessing cpmprehfension 
of the English portion which each respondent, correctly answered ^as ^b- 
tracted from tlie percei^tage whicfi he cor^rect ly^answered of item^ assessi;ig 
comprehension 6f the Spanl3h portio^i. The percfeilt^ge correct of the other 
types of items--assessing interpretation of various components of the 
conversation as a whole, such as th^ role-relationship between speakers-- 
was also computed. Correctness was .scored in terms' of the impression 
intended by the a\:toiri5 in their formulation of the social situation to be 
presented • ^ 
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The Aisefulness of contextual! zing the listening comprehension test can 
b,e seen from the differences which were observed according to conversa- 
tion- For example, there was a greater difference between the average 
English and Spanish comprehension scores for the conversations which took 
place within the context of work than for the conversations which took 
pl^ce within the context of a home. Similarly, the relationship between 
th"e ability to understand the manifest content of the conversation (what 
- was said) and the ability to interpret the latent content of the conver- 
^ sation (what was meant) diffeiced according to conversation. For example, 
respondents correctly answered a greater proportion of latent content 
items than of manifest content items for a conversation taking place 
within a hume^ whereas the reverse was true for a conversation taking 
place within an office. Thus, knowing what was >:aid did not necessa^rily 
enable listeners to absorb the full communicative impact of a conversa- 
tion; conversely, missing the details of manifest cont.ent did not nece^ 
sarily prevent listeners from grasping the speakers' intent. 

Techniques such as the contextual i zed ' 1 istening comprehenbion test 
can be used to assess the degree to whia*i a community's rules of speaking 
have been internalized. Itius^ two contrasting groups, to whom the test 
was also administered (Anglo high school students studying Spanish and 
South Americans studying at a university in New York) often agreed with 
the Puerto Rican respondents about what was said but disagreed with them 
(and "with each other) about what was meant. Clearly^ the communicative 

dist inguisiied from narrowly linguistic) competence of the three groups 
differed not only from one another but from context to context, and many 
of these differences would have been lost by convent ional noncontextual • 
ized measures of language proficiency. . 



Devices which aio designed to c*SSc6:> the relative txcqueji^y vvJlJi whA«^!i u 
person employ:^ his languages or language varieties may be misleading it 
they yield a single, overall score of langt^age usage. Thus, if a person 
is asked wfiat language he uses every day and if he uses one language for 
most everyday purposes but reserves another language for use in specified 
social contexts, his response that lie mainly uses the first language, 
while true, •is al^o misleading since it obscures liis sytematic use of 
another language. 

Two procedures for. asse:3sijig language u^age illustrate the advantage 
of ^obtaining inf©rmat i<tf^ about usage in different contexts. One measure 
is a language-usage quSi^.iiDnnai re developed in connection with the study 
of Puerto Rican bil ijigu^'^^iem me?ntioned above. Thirty-four schoolchildren, 
aged 6 t-o 12, were^ i^ill^^l^^ The children were asked a 

series of quest degree to which they used Spanish rela- 

^ rive to Engl:i^l?.")?^i^^ bilingual interlocutors in school, at church, 

in the neighb'u^Sf^^^ home. For exajnple, the children were asked 

to indicate the e'xtent to which they used Spanish with other Puerto Rican 
bilingual children when playing outside on the street near their home. 
Responses were scored on a 5-point scale, with the exclusive use of 
Spanish at one end of the scale and the exclusive uie of English at the 
other. An average rating for the use of Spanish across various inter- 
locutors was computed for each respondent and for each context. The 

..children report'ed that they used more Spanish than English in the contexts 
of neighborhood and family, and more English than 'Spanish in the contexts 
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of school and church. Their overall use of English and Spanish, however, 
summed across all four contexts, was about the same. Thus, a question 
designed to assess only their global use of Spanish relative to English, 
without reference to %he contexts of language usage, would have been mis- 
leading* 

Whereas the first example of a contextual iz^ed measure of language 
usage was obtained from self -reports , the second example was obtained from 
the reports jof outside observers . As part of a study of the status of 
English in Israel, the use of.^, English , on a busy shopping street in Jeru- 
salem and in the shops that line it, was described (Rosenbaum et al*, 
1977). A transaction-count procedure (Bender et al . , 1972) was employed 
by. which the number of persons heard speaking English was determined* 
It was found that approximately 14% of all the persons heard (iV=936) were 
speaking English (the majority of speakers, of course, used Hebrew)- How 
ever, almost all of the interactions involving English were between 
pedestrians talking to each other on the street or between customers talk- 
ing to each other in the shops. There was very little English observed 
between customers and shopkeepers inside the shops, but this was not due 
to the fact that the customers and shopkeepers ' did not know English. In 
fact, most of them were able to conduct transactions in that language. 
English was used mainly between native speakers of English, not as a 
lingua franca, i,e. as a medium of communication between persons who do 
not share the same mother tongue. In Israel, Hebrew is the lingua franca 
par excellence* It is the language which is expected for use betwd'en 
Israelis who do not share the same first ' language . Si.nce almost none of^ 
the shopkeepers spoke English natively, the native speakers of English 
used Hebrew with them. Thus, the transaction-count procedure recorded an 
Important systematic difference in language usage by taking into account 
the relationships between speakers. ,If onl-y a single count had b^en 
made-'-of all speakers across all contexts- -this difference would have been 
missed. 

L^nga4±cft:T Attitude 

Just as global jueasuics ot languci^c t^i ot i c i cuc^' and lajigAiago u^ago juny 
be misleading, so global measures ol: language attitude may obscure 
important systematic differences. Attitudes toward a language or atti 
tudeSJ-'Howard a referent for which language serves as a symbol may vary 
accordin^g' to the context in which the language is used. The* effective-* 
ne^s of a context^ualized approach to the study of language attitudes can 
h6 illustrated by two studies, the first by Carranza and Ryan (1975) and 
the secprid by El-Dash and Tucker (1975). 

Carranza and Ryan asked bilingual Anglo and Mexican American high 
school students to rate speakers of English and Spanish ori the basis of 
voice cues alone. Following the work of Lambert (1967) and his associates, 
a comparison of evaluative reactions to speakers of two languages was 
; * u3ed ,as an indirect measure of interethnic attitudes. Such a procedure ^ 
typically employs, as stimulus material, tap^s recordings of speakers j 
reading aloud a standard passage. This procedure has been criticized onj 
the grounds that differences in ratings of speakers in the two languagei'^ 
may occur if the passage read represents a context inappropriate to one/ 
of the languages (Agheyisi and Fishman, 1970) . Accordingly, Carranza and 
< Ryan used two speech contexts* In one, a mother is talking as she pre- 
pares breakfast for her family; in the other, a teacher is giving a history 
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lesson to her class. These contexts were designed to represent home and 
• school, respectively. .Each context was recorded in each language, yield- 
ing four passages in all. Respondents were asked to rate each of sixteen 
different speakers (each reading one of the four passages so that each 
passage was heard four times) on each of 15 semantic-differential scales. 
Their responses demonstrated a striking interaction between language and 
context. The English versions of the school context were more highly 
rated than the Spanish versions, whereas the reverse was true for the 
home context. For these respondent^, attitudes towards each language 
(or towards the group represented by the language) was in part a function 
of the appropriateness of th^ context in which each was used. If only 
one context had been employeil, the results would have" been misleading. 

In the research reported by El-Dash and Tucker, the views of Egyptians 
toward Classical Arabic, Colloquia/1 Arabic, and three varietsLes of Eng- 
lish (American, British, and Egyptian) were studied. Respondents were 
asked to rate various personal characteristics of speakers heard on tape 
recordings, which represented each of the five la.nguage varieties. They 
were also asked to rate each speech variety heard with respect to its 
suitability for each of five contexts (at home, at school, at work, on 
radio and television, and for formal and religious speeches). While the 
respondents tended to rate speakers heard using Classical Arabic more 
favorably than speakers heard using the other speech varieties. Classical 
Arabic was not considered suitable for use at home. In this context. 
Colloquial Arabic was preferred. Thus, attitudes towards language use 
were found to be a function of communicative context. Again, a global 
measure would have obscured this result. 

SUMMARY 

This article has Ju;6titied a sue i o U ligui 3 L i c approach to Llic cojiitiuc 1 1 on 
of language assessment devices on the grounds that it is both necessary 
and possible to use such an approach. It is necessary because language 
behavior and behavior toward language vary as a function of coiiununicativc 
context'. ITiat it is possible has been demonstrated by the examples of 
contextualized measures presented here. Authors 6f language assessment 
procedures, theirefore, should consider explicitly the contexts in which 
the behaviors they wish to describe take place. The more the criterion " 
behavior varies as a function of context, the more important it is to 
construct techniques which can reflect variation." 
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Pragmatics and Language Testing'' 

John W. OHer, Jr. 




It is interesting that we often speak of the coinage of termfe. This 
metaphor is doubly effective if you consider that terms are more or less 
stamped into existence by a mentor (iF I may be allowed a bad pun), and 
th^y have a certain purchase. value like any other coin. The coin meta- 
phor is particularly apropos to the term "pragmatics" which, in the words 
of William James, emphasizes the "cash value" of linguistic elements as 
negotiable items in communication. It comes from a brand of American 
philosophy initiated by Charles 5, Pierce at about the turn of the century, 
and his thinking was extended by William James, John Dewey, and Charles 
Morris. 2 Although all of these scholars were Americans, the methods and 
assumptions of what may be called a "pragmatic approach to language 
study" are by no means unique to Americans, nor are they a recent devel- 
opment . •« 

This paper discusses in historical perspective the major concepts of 
pragmatics and relate them to language testing. Grammar is viewed as a 
theory of language competence and is characterized as a pragmatically- 
generated expectancy device* It is claimed that ill order to adequately 
measure language skills, language tests--whether for first, second, or 
foreign language learners-*must activate the internalized expectancy 
grammar of the learner. Empirical data showing remarkably hi^ correla- 
tions on very different tests pf language skills are explained on the 
basis of the postulated expectancy grammar. It is hypothesized that other 
tests of language skill which fail to produce high correlation with effec- 
tive ^integrative tests" (th6 -tre±m is from Carroll, 1961) are probably 
invalid as measures of* language proficiency. 

The claim that a person who learns a language internalizes a grammar, 
i.e. a gener^ative system that specifies relationships between sound and 
meaning in the langistage, is now widely accepted, though there is still a 
lot of debate about the form of such a grammar. For instance, there is 
disagreement about whether it is primarily motivated by syntactic, seman- 
tic, or pragmatic considerations, whether it is more or less generated 
by principles of learning such as induction and substitution (Oiler, 
1972c), or whether 'it' is in substfintial portion already present in the 
brain of an infant at birth. In this connection, it may be useful to note 
that the transformational generative approach seems to be evolving ih the 
direction of a pragmatically-motivated theory of grammar. The first 
st,age of the Chomskyan paradigm was^ tKe position that syntax and semantics 
were independent (see Choift^y^ 1957, reviewed and criticized by Reichling, 
1961; Jakobson, 1959; and others) ; the second was that syntax and seman- 
tics were not independent^ut together Were independent of pragmatic 
considerat:ions (see Chomsky, 1965; Katz and Fodor, 1963; and Katz and 
Postal, 1964; also, see criticisms by Uhlenbeck, 1967; and Oiler et al., 
1969); the third stage which now seems to be emerging is that syntactic, 
semantic, and pragmatic factors are intricately interrelated and may, in 
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fact, be inseparable. This latest developmeT>t is illustrated in the fol- 
lowing excei*pt from Ij^tnguage and Mind (Chomsky, 1972) : 

. * " 

^ It is not clear at all that it is possible to distinguish 
sharply between the contribution of grammar to the determination 
of meanings and the contribution of so called 'pragmatic consid- 
erations questions of fact and belief and context of utterance. 
Lt is perhaps worth mentioning that rather similar questions can 
be raised about the notion 'phonetic representation'. Although 
the latter is one of the best established and least controver- 
sial nptions of linguistic theory, we can, nevertheless, raise 
question whether or not it is a legitimate abstraction^ 

jPwhether a deeper Understanding of the use of language might not 
show that factors that go beyond grammatical structure enter 
into the determination of perceptual representation and physical 
form in an inextricable fashion, and cannot be separated without 
^ distortion, from the formal jxiles that interpret surface struc-- 
ture as phonetic form (p. Ill), j 

Apparently, Chomsky now sees theories of both sound and meaning as sus- 
ceptible to reinterpretation due to pragmatic facts. Concerning phonetic 
representations, Dennis Sales and I had advanced the same basic argument 
as early as 1969, on the basis of a series of demonstrations showing that 
controlled variations in extralinguistic contexts systematicial ly brought 
about changes' in the stress patterns of the surface forms of utterances • 

Further support for Chomsky's somewhat cautious hint is provided by 
D. K. Oiler and Eilers (1975), who showed that the quality of phonetic 
transcriptions pf children's utterances is improved when the transcribers 
are either able to guess or are told the meanings of the utterances. 
Even more recently, and perhaps more confidently \han Chomsky, Fillmore 
(1973) has made a strong case for , the importance of pragmatic factors in 
language teaching. In fact, his remarks parallel closely some of the 
observations ion the importance of pragmatics in several earlier publica- 
tions on the same topic (compare Oiler, 1970b, 1971b). 

The chief argument in favor of a pragmatic approach to language 
derives from the principle of non-summat ivity . A theory of /language that 
attempts' to divorce syntax from semantics can 'Vio more hope to explain 
language communication than a good book on spelling can hope to explain 
the logic of a novel. The same sort of reasoning Suggests that any 
attempt to account for meaning apart from situatio^^ context (i*e* 
semantics divorced from pragmatics) is doomed to ir^oequacy. With the 
present re-examination of the whole question of the^ relation between lan- 
guage and extralinguistic contexts, it seems that a growing interest in 
pragmatics is likely to be a major theme in linguistic analysis for some 
years. Although there are many important unanswered questions ,^here no 
longer seems to be any substantial disagreement concerning the fundamental 
need for a pragmatically-based account of language use and language 
learning. Scarcely anyone is still seriously maintaining that grammar 
can be regarded as a self-contained entity independent of extralinguistic 
contejfts. This seems to be an indication of progress.. . However, the 
turn to pragmatics does not represent the birth of a new approach so 
much as a return to a .useful tradition of, language study. On the other 
hand, it does constitute a significant change in current trends of research 
in the language sciences. . , 
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As Solomon said, "There is nothing new under the sun/' and much of what 
seems to be progress is without doubt merely a restatement in qurrent 
terminology of notions that were held true by the ancients and have only 
been rediscovered in their modem contexts. A serious student of the 
nature of human communication and mental behavior can ill^afford a con- 
temptuous attitude toward early thinking on these topics (Chomsky, 1972, 
and Cherry, 1965). One could compile a great compendium of pithy quota- 
tions showing that ancient scholars and many of their progeny were .aware 
of the importance' of the fact that language often relates to things 
other than language. What is remarkable is that some linguists in recent 
decades seem, temporarily at least, to have forgotten and even- act ively 
neglected so important and obvious a factT This, of course, is the 
reason that it is necessary to stress the pragmatic nature of language. 

One of the early indications of concern among language theorists 
for the pragmatic aspects of natural languages was the theorizing of the 
Danish Modistae in the thirteenth and fourteenth centuries, Bursill-Hall 
(1971) claims that they "constructed their theory [of grammar]^ on extra- 
linguistic facts based. on the structure of reality (p . 35, footnote 
84)." Their concern for universal grammar was a tradition continued by 
the Port-Royal grammarians of the seventeenth century (Chomsky, 1966; 
Aarsleff, 1967). The emphasis of bcAh schools 'on the rational explana- 
tion of grammatical categories in terms of intrinsic logic and extralin- 
guistic fact is evidence of their concern for what has been termed - 
pragmatic mappings (Oiler, 1975a). John Locke (1690), who was a late 
contemporary of the Port-Royal grammarians, even went so far as to argue 
that "all words are taken from the operations of sensible things,... 
(cited in Kuhlwein^ 1971, p. 53)." 

Pragmatics distinguishes two basic levels of commurlicat ion that are 
employed'to relate linguistic* elements and extral inguist Ic situations. 
James Harris (1751) proposed the bas^. for this distinction only a v 
century after the heyday of the Port-Royal school (and I doubt that he 
wrfs the fiYst to notice it): "The Truth is, that every Medium thro' 
which we exhibit any thing to another's Contemplation, is either derived 
from Natural Attributes , and then it is an Imitation; or else from 
Accldent;s quite arbitrary , and then it is a Symbol (his ita^lics, cited 
in Kuhlwein, 1971, p. 69)." In other words, we may either Tise pictul^es 
or abstract symbols to map or portray extralinguistic facts. In natural 
la-nguages we usually use both, though we tend to rely more heavily on 
abstract symbolic means fot^ the communication of cognitive context and 
on facial expression and tone of voice for the Conveyance of Uttitudinal 
information. As James Beattie (1788) put it, "the Natural signs of 
thought are those changes in complexion, eyes, features, and attitude, 
and those peculiar tones of voice which all men know to be significant 
of certain passions and sentiments (cited in Kuhlwein, 1971, p. 97)." 
The brandished fist may be a -picture of a threatened slug in the mouth, 
or a symbol of brotherly solidarity, just as a smine is^ often a sign of 
friendliness, or sometimes hideous spite. ^But, postural and gestural 
changes are in themselves quite inadequatMte code much of the informa- 
tion that the human mind negotiates. BeattTe says, "when compared with 
the endless variety. of our ideas, these Natural Signs will appear to be 
but few. And many thoughts there are in the mind of every man, which 
produce no sensible alteration in the body (p. 97)." He goeaon, "Arti- 
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ficial. Signs^ or Language, have, therefore, been employed for the purpose 
of communicating thought, and are found so convenient as tt) h^ve super- 
seded inV^^great measure, ;the u^e of the Natural (p. 97)." Whether or 
not itfe^ agree with Seattle's intimations about language evolving froip a 
need to communicate more kbs.tract notion^or j#ieas . his remarks clearly 
differentiate the two fundamental modes of communication distinguished 
in 20th century pragmatic theories of language (Watzlawick e.t al., 1967). 

Of course, pragmatic mappings are abstract, and simple explanations 
that try to relate words and things d4rectly are quite unsatisfactory. 
Paralleling the earlier statement qqoted from Locke, Dugald Stewart 
(1810) wrote, have . . .rem^^'ked the disposition of the Mind to have 
recourse to metaphors borrowed from the Material Worljl.... This analogi- 
cal reference to the Material World adds greatly to the difficulty of 
analyzing with philosophical rigour, the various faculties and principles 
of our nature, yet it cannot be denied, that it facilitates, to a, wonder- 
ful degree, the mutual communications of mankind concerning them, . . . 
ic±%ed in Kuhlwein, 1971, pp. 100-101)!'' 

Though serious scholars have often noted the difficulty of incorpo- 
rating into, theories of language abstractions that relate words to things 
(Bloomfield, 1933; Harris, 1951;^ and Chomsky, 1957, 1965; just to mention 
a few), the fact that such relations -fexist is one that is ignQred at 
great peril to the theories. Surprisingly, Bloomf ieldian and early 
Chomskyan writings are about the only prominent sources of language thed - 
ries that use the ostrich apprpach to*" pragmatic data, excepting possibly 
the positivistic philosophy of Rudolf Carnop an^ hi,'S followers. The 
I'ectures of de Saussure, about 1912, compiled by his students (-^959), and 
the writings of B. Malinowski (1935), L. Hjelmslev (1954), J. R. Firth 
(1957), the Prague SchQol (Vachek, 1966), Sidney Lamb (1966, 1973), 
M. A. K. Halliday (1961, 1977), and many others have maintained cognizance 
of the fact that language is used for purposes other than putting words 
together in a neat arrangement with other words. Since the writings of 
most of these more recent authors are better known and more readily avail- 
able, we will just refer brie*fly to a note about Firth. Robins (1963)' 
says, ^'meaning, the object of all linguistic analysis in Firth's approach, 
is function in a context, whether the extralinguist ic context of situa- 
tion or the intralinguistic contexts of grammar, phonology, or other 
subsidiary levels (reprinted in Kuhlwein, 1971, p. 9)." 

Nor have linguists (excluding the Bloomf ieldians and early Chomsky - 
ans) been the only scholars concerned with pragmatic aspects of' language 
structure. Psychologists (especial ly Osgood, 1957b), philosophers 
(especially Russell , 1940) , ' logicians (especially Reichenbach, 1947) , 
communication experts (Cherry, 1965), and even physicists. GEinstein, 
1951) have persistently evidenced cioncern for the fact that language is 
intrinsically structured for the codification of information that is 
largely, non- linguistic . Not long ago, a group of logicians and philoso- 
phers met in Jerusalem to discuss the importance of pragmatics to theories 
of natural languages O^^r^Hillel, 1971). More recently a journal has 
been created on the topJlc . Obyiously, a great deal more could be ,^ and 
perhaps should be said in this vein, but the main theme of this volume, 
which is language testing, draws us in another direction. 

^ ^As Upshur (1972) observed, trends in language teaching havte tended 
to trSil along in the wake of linguistic theories, and trends in testing, 
at least in. second or foreign language testing, have been similarly 
tagginfif along behind the prevalent methods and theories of language 
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teaching Witness th^ influential writings of Lado (1957J on niethods. of 
language teaching patterned after the ^ "scientific'* analysis of. language, 
and his companion volume orv language ^sting, (Lado, "1961) . These and 
many other books and articles on language testing and teaching were 
heavily influenced by theories that deliberately ignored the data" of ' 
pragmatics (Oiler-. 1971^, 1973a, and forthGoming) . The "discrete point" 
method of teach|g|fe-^d testing are both naive concerning the fact that 
a totality is greater than >ust a heap of unreiated part^, 

- Fart unal: el y, many applied linguists rejected the naivete that was 
characteristic of dominant theories in the late 194£)'s, the 1950's, and 
through the hiid-1960's. For example, as early as 1904, Otto Jespersen"; 
the Dani-sh linguist, was ai^guing that materials designed to teach a 
foreign language ^should have meaning ful 'sequence throughout. He realizei 

^fhat if Linguistic .structures are presented outside of a meaningful con- 
text, learning wlil be more 4iffi:cult, Thi« has subsequently been 

•demonstrated many times over. (F9r a reviev? of the literature, see Ollei 
1971b.) As Jespersen put it, . , 

* . .^.we cJVght to learn a language through sensible conunuTiications; 
there must be (and this as far as -possible from the very first 
day) a certain connection in the thoughts communicated in the 
new language. . .one cannot say anything with mere lists of words. 
Indeed rrot^ even disconnected sentences ought to be used . ."^^:1Vhen 
people' say that instruction in languages ought to be et kind of 
ni^nral^ gyinnast ic3 1 do not know if- one of the things they have 
in Tirlnd ' Is ... sudden and violent leaps from one- range of ideas 
to ano^ther (p. 11). ' < ^ V 

The reason that pragraat itTal ly based language teaching materials can be 
•expected to be more effective has been made clear in numjetrous psycholii«i- 
guistic studies in recent years. If the learner is made aware of the ^ 
.pragmatic contexts to which iangu,age structures relate, he has 'a much 
mfe)re powerful .basis for subconsciously cortstjruct ing and thereby internal- 
izing the grammar of the ^ language . The= p^xtial^y predictable sequence, 
of events^ in communicative contexts is one sort of datc^t that the learner 
can capitalize on to great benefit: tn fact, it pragmatic' mappings of 
utterances x)nto contexts ar6 not made available' to the learner, there is 
no r^a^son to suppose that language acquisition can occur at all. 

TOWARD A DESCRlfriON OF A PgAGMATlC' HXPtCTANCY GRAMMAR 

Let us- now turn our attention to the charalcterization of ^raiiunar as a 
model of underlying language competence.. (Later we will relate these 
consideratic^ns to. language testing.) We will First discuss some empiri- 
cal facts of language use which suggest that-one of the important charac- 
tei^istics of such a grammar must be a- capabi 1 ity to generate expectancies 
based on contextual dependencies. With this in mind, we will use* the 
term "expectancy grammar." .'S6me empirical data will be cited in support- 
of this notion, and a partial fo:;malism will be described. Then we will 
consider 'some findings of- research iii language testing and attempt to • 
draw some .inferences about- valid language tests. 

Fpr some years now, it Has Been popular to *sp^k of the' perception of 
language as a process of aT>alysis-^by-synthes.is : The evidence that such 
a process underlies the perception of linguistit sequences^^-whether . in 
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listening, reading, or some combination of the two, as when following a 
'text that is being read aloud by someone elsei--is pervasive* The motor 
theory of speech perception, proposed by Liberman (1957) and others at 
Haskins Laboratories, maintained that the perception of distinctive 
sound segmen'ts was mediated by the articulatory processes necessary to 
produce those segments. This notion provided the seminal basis for the 
later theory of analysis-by-synthesis developed by Stevens (1960). It 
/ is well-known that some phonemes of En^lisli, for instance, and^especial ly 
the distinctive intonational contours of English, are often indistin- 
guishable without reference to higher- level contexts (Lieberman,. 1967; 
*|||Btfevens» 1960) -/^-'"Tfrfevljaye already noted Chomsky »s remarks (1972, p. Ill) 
in this vein and the eV^ird^nce in support of them. Much recent research 
in the perception of spokeii^nd written forms of languag"feN^sugges ts that 
there is a close relationship b^^t^ateen p erce ptual proce^s^s and the prag- 
matic structure of language (see th*^"SrtTclrW^n Mjoj;:^^ and Jenkins, 
1971; also, Kavanagh and Mattingly, 1972). ^ — ^ 

It seems that the perr^eption of linguistic sequences is mediated by 
an expectancy grammar that ^i^s^,.eontinually formulating, modifying, and 
reformulating hypotheses about the underlying structure and meaning of 
input signals. These hypotheses are related via pragmatic mappings to 
extral inguist ic contexts and are instrumental in the analysis of the 
surf ace . form. Chomsky and Halle (1968) suggest that, **the Fiypothesis 
will... be accepted if it is not too radically at variance with the 
acoustic material." Or, putting it differently, the iperceiver seems to 
rapidly alternate between a synthesis that is *'fast" and "crude" and an 
analysis that is "deliberate, a tt et^'t i vf»\ . . , and sequential" fNei<5<?#^, 
1967) . 

A growing body of experimental data on listening and reading pro- 
ce5=?ses lends creden^^e to the ana ly is- by - synt>\e5 i ^ model ^f r^rc<=*pt i on . 
As Levin and Kaplan (1971) p'*^' fi t , "listener*? and readers alike, appear 
to decode sentences not OT>ly by interpret inp as »'hey hear '^r read, but 
also bv anticipating w>^at is likely To come- next' fp. 2)." Krller^ (]97i^ 
argues th^t the re^^er or list*^ner seldo^^^ mnl<es a ^p<='cjfic g«ies«^ as to 
what w<^rd . phrase, '^r ^^^-n^ence is likely to foll'^^"' Pat' er. he ge^^era'*"*^ 
a kind of readine*?s for a r'^i^^e of ro^ ibiliti'^s Tt i • ' hoiig^ the 
perc^iver were prer ared fo^ a*^«:wers to c<=*rtain qn*""* • v r>\/^T, it 

w*^»'- se''rr}iirig for an'^wer*; t* o sfi<"^ ific qne^tions. 

The synth''<=is, O' >^ypor'esi<=, tliaf th' p'r eiV'^r g^nernte* a ma'^* 

for t Ke iniMit ciignal ^ • * o}\a r a c t e t i 7 - d in a nat ral i>« tr^rxj}" ^f 

gra"*rnair of expTtanc\ . Tbe percf»iver' h y po t <^ j ahoii* th^ input s^p 
nal is lAT^ely hnc:ecl on f''^ co"t<^xtual co»\«:tra*T*t that arp "tili7f='H 'v 
the i nte^'na 1 i 7ed g » aimnar T}^e po t}if><;i<; tl^at ev ntually qrc'pt<^' 

wliat the por^eiver hea»s. rea'^s, or ^ i »M e r s t an d s . Tn perception. '}\c 
exper t a t i <^ns generated b\ t'^e graminar -^re su^jer* to rpoH i f i r r t i on wh<-'« 
ever the> fail to pr oduc e f f i r i en t nri t for t ho i n c o"' i ng i ^n » 1 

Creative erroY=; in reading ^ur\ Mstenine pTovid*" drama* i^ evidence for 
this process. For <=*xample^ one foreign sfMent t raking a d i c t n t ' on te=* 
at UCLA transformed an '^ntire paragraph on "briin cells" int^^ a fairl^ 
readable text on *'brand sale'^." The stnd*"nt's r<^nd»tion w^c: similar 
phonetically to the original passage on a p>iTo«:<a h^ phrase ba-is, bM r 
completely obliterated the original content l,e?<= Yemnrkablo examples 
illustrate the same underlying process. On another dictation test, fo» 
instance, stiKient*^ wrote "sci^-ntis* 's i ♦*>a g j n n t i ' rr* arv* " s c i e*' t i s t ' • 

evaminat ioTi*?*' for i<^irvr \ ct " f r .no r» n f- 5 ^ i, - Mi o 1 t ♦ ^ ♦ ^ ; 1 - . f 
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errors are actually quite common among non-native speakers. Richards 
(1971a. 1971b) has^ drawn some interesting conclusions about underlying 
processes based on learner's errors, as has Corder (1971). 

Although the process of analysis-by-synthesis has been quite widely 
accepted by researchers as a plausible basis for the perception of lan- 
guage, to my knowledge the proposal that an analogous process underlies 
the production of language (Oiler, 1975b, 1974a and b. and forthcoming) 
has only been tentatively suggested. Nevertheless, it seems that language 
production is a kind of synthesis-by-analysis.. The speaker or writer has 
an idea that he wants to communicate, but, as Col4.^ Cherry (1965) has 
said, he never really has it until he "jumps on it jrfith both verbal feet." 
It seems that the speaker or writer has a notion of what he wants to say-- 
a sort of hypothesis or prior synthesis, if you like--and he analyzes it 
by putting it into words. In this way, we may conveniently explain the 
potent observation by Dewey (1926) that the words which come out of a 
person *s moath often surprise that person more than anyone else. In the 
case of language production in contrast to perception, expectancies are 
the governing factor, and the physical signal is adjusted to match them. 
In performing the synthesis-by-analysis, it is frequently the case that 
details and relationships previously unavailable on a conscious level do 
become available consciously, and, hence, the surprise va^Jue to ourselves 
of the things that we say. 

At this point, an emriBrical examplte may help to make clear how a 
grammar of expectancy se^es to explain crucial aspects of language use. 
The example will also provide a bridge to the descr\ptian of the tentative 
formalism for a grammar of expectancy which^ is discussed below. 

Xlark (1966) has shown that phrases referring to Xctor,. action, and 
recipient in tr^sitive sentences are differentially constrained in actives 
and passives. In the passive, as Levin and Kaplan (197rO' pbserve, "the 
latter part [referring tol the , . . [act^ion] and the actoX 'is highly coT^r 
strained by the former part, t he , . . [rec i pi ent ] ; t>iis wa«; iitot true for the 
corresponding parts of active sentenc**s (p. 4).*' Clark (1966) and Roberts 
(1966) have shown further that re^^all for actives and passives is coverned 
by the uncert a i nt i predicted b^ Clark's earlier experiment \ T>»e expe^ ' 
ments of tevin s^nd Kaplan ri97n th'^msel v»^s , with «ye voioe-sp^n (FVS) , 
confirmed their pr '•d i c^- i oti that FV*^ would increase in tb*- middl^ of pas- 
sive se^t^enc'=*s hu ^ not in ^ctiv <? . P <;ul's tb't r*\ippor sltrilar v'- freli 
zn t ' ons ha e be erf achieve' wi t * • n • • - ' i r . - i 1 1 . . ♦ , . t 

» Kapl^M fl^*n -•'•pp^-t »^nt 

the impor.fant poiT^t is tb'^t th*^ r-n^^tTaints farili^-.-ate proce?: 
sing only in^ofnr they 1 ead ♦o t>^e ^i^mat^on of ^lucc^ssf** 1 
an t i c » rat 1 on*^ . The reader then can te^t »m s hypot^esi*; for h\rn 
self Tf it is ^ on f i rm e*'^ » the pr^^vio'^sls assig'^ed interpreta- 
tion is acc**pt*^d 5»nd the material ca^* he easily and efficiently 
processed. Tf he c annot conf i rn» his previously assigned inter- 
pretat ion he niu<:t backtrack and reass i gn interpr etations, which 
seems easier to do in reading than in listening To elaborate, 
successful hypothesis generation depends on ♦he ability ♦o for"^'» 
late, or assign some tentative* i nt «>rpr o t q, r i r^n ro w>>at >^fi<= k^ot^ 
read or heard (p. 13). 

Furrhejr conf irmat i^on is provided by W^nat and l.'^vi'^ n968) and W'>hat (1971K 

The emp irical Hatn ppor t i ng t h^ op<^ ;nf'w>f» <»f pt* <*vp^r»-nT>r\ pr ^mmn T 
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" a deA4ice that generates and confirms hypotheses, is overwhelmingly affirm- 
ative. The major question that remains is what specific form a gramma,r. of 
expectancy might take. In other words, how can the notion be formalized 
in a suggestive and helpful way- More importantly, one may want the for- 
malism itself to be vulnerable to empirical tests in a variety of ways.# 
Iri particular, it should mirror the utilization of contextual constraints ^ 
in a way ttii^ naturally accounts for the data of language use. It should ' 
also be convetiiatttW modifiable wherever it fails to predict the data. 

The first attempt at such a formalism was notably unsuccessful. It 
was, in -fact, a left-to-right finite state grammar which operated only 
on constraints from one word to the neXt . A significant inadequacy of 
such a mechanism was its inability to suspend processing of a certain seg- 
. ment while dealing with an intervening one: it had no way of remembering 
to return to the earlier segme*it in order to contiiiue its work there. 
Related to this debilitating difficulty was the lack of capability in 
handling recursive functions where elements of indefinite length might be 
strung together or imbedded in complicated ways. The observations of 
Lashley (1951) showed the total inadequacy of such devices as models of 
even the simplest sorts of "human behavior. The . work of Chomsky (1956) 
provid'ed a mathematical proof for the inadequacy of finite state models. 

A second stage in the attempt to come to grips with the sequential 
nature of language processing was achieved indepenci^nt ly and nearly simul- 
taneously by a number of researchers in diverse areas. '^Finite state 
devices," "transition network grammars," or "directed graph models," as 
they were variously called, were modified in important ways to achieve 
recursive generative capacity (Newcomb, 1963; Johnson, 1965; Conway, 
1964)- While in most cases the generative power of the mechanisms achieveH 
at this stage was equivalent only to context-free phrase structure gram 
mars, they maintained the virtue of a fairly straightforward ^<"count of 
the se<^uential nature of a great deal of language processing. This vii'i"'' 
is not shared by the phrase structure grammar^; usually written by lin- 
guist's operating in the Choms^yan tradition. Moreover, recursive t'^^^n-^^ 
tion network models ^*llow for t>^e evpressio" of "ertain gr am^'a r i ca 1 
regularities in mpre ecoMomlr^l and natural ways than phrase structure 
rewrite rule*^. They are more 'onve^'ient in that the efTe^t^ of moH ^ ^ • * 
tions in the g rammer r»\ f t ^-r^ ^-t^ fi wi i^h'T"^ 

*^friict>ure grammar- . 

Never t>i 1 e^ ^ , wit^^out fur the mod i f ^ r ?> t « on in the ''ir^^c^^on of gr- 
comp 1 ex it^. rec'^si^e t"ra"«;iti on mode 1^ ar' fna^eqn-^te ^n a \ im'-er o f 
important w?i\s. Woorl<; (lO^^o) ohci-rved tViey are not a'- 1 e to "movp 

fragment^ of tl^e ^^^nt.en'e aroMpd («;o that th-^ir position^ i deer" st'*^' 
ture are differe^it rom those in tlie «;i)Tfac*^ <^ t ruct'» r ) , t^o c'^py and 
de 1 e ^ e f r agtren ^ *^ of <^ en t <^nc structure, an*^' to m'^ke-.-^rtions on co'> «; t i t ii 
ent<= generally dei'endent on *hf^ content in wh i ' h tho^ie ' on'' t i t>ient «i oc"* 
(p. ^92>." W<^ods propo*;ed a «;olntjon to these problems »n the f'^rm "f 
what he calls "an augmented rec^'rsiv*^ tmnsition network grammar fp, Fi'^l) 
The grammar Woods has de^'eloped do^s not utilize pragma t i c ^onst ra i n t s o^ 
ext ra 1 ingu i St ic context, hut in spite of this limita^ioTv it provides a 
useful formalism for the n'^tion "expTtancy grammar** as we have used the 

term here. MoreOV<=*r. «;iich ^ grammaT r:^i\ h^* moHifiod tf\ in<^^TpoTntr«. prag 
matic information. 

The changes that Woods (1970, 1972] imposes on the ""'recursive transi 
tion network** in orci'^r o achi*^ve what he calls an **aug"iented rer^irsivo 
t ran*^ ition n<*tworV." "^r mp 1 v ?it> "on ^ment ed trnn-itif^i^ n*^t worV." nr«^ 
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various "conditions" which must be met if a transition is to be followed, 
as well as certain "actions" or formal operations on constituents that 
are to be executed if the transition' is followed. The augmented grammar 
that Woods has developed is capable of keeping track of tentative deci- 
sions already made, and of modifying them as more input is analyzed. At 
the same time, it is constantly keeping track of the limits of subsequent 
possibilities by anticipatory operations based on the information avail- 
able to a given point. Woods says, "structure building actions associ- 
ated with the arcs of the grammar network allow for the re-ordering, 
re-structuring, and copying of constituents necessary to produce deep 
structure representations of the type normally obtained from a transforma- 
tional analysis, and conditions on arcs allow for a powerful selectivity 
which can rule out meaningless analyses and take advantage of semantic 
informatiort to guide the parsing (1970, p. 591)." 

To illustrate the functioning of an "augmented transition network 
grammar," or am "expect^cy grammar," we may refer to the differential 
processing of active and passive sentefnces noted earlier in this paper. 
Examples a and b below are parallel in several respects. However, a is 
"passive" while b is "active" (at least as these terms are defined in the 
research mentioned earlier). To be more correct, technically a uses a 
transitive verb in the passive while b uses an intransitive verb. 

(a) The little boy was bucked off by the spotted pony 
{b) The little hoy was gone ^way by /the next d^iy , 

An expectancy gramijiar witH the properties described by Woods fl970, 1972), 
with modifications to take pragmatic information into account (as Ho re8 1 ' 
speaker-hearers), might recognize and intprpret a and b in roughly the 
w)ay described below. . For the sake of the example, we '^ssume a phonetic 
analyzf^ plu<= an expectancy grnmrn^nr. Many pramm^ti^al details are omittr-i 
and we 'eif\' the pra'^iniar . that is we as^"n>- fh^t ^^ ^ « cnmohow ir^toTTi?>' 
i To-'i in the b^ain o t\\e ^^peal'er hearer 

The Mrst input w^rd "lad*- ava i 1 ab 1 ^ to the exp<=>ct a^^cy grammar from > 
is the. Th i ple''»cnt caii he ^egn^ent H *^roni little an-' the followinp 
elements ina^^muf-h 05 » he pT^>-^*nqr hn no Ipxic^^ entt\ c " r r pspond i n^' to 
p^'onetir c,^q.....cpc of [9-1]. f^c.lj] fO-lr^l). etc. Jr ^ Pc-g'M ze^ the 
a d terTrin<^», lfenr<-, -r kn-w^ Miaf the i <^ ' h*:* i n- i ^'g of > noun ph r-'^ " - 

(of (-ont^e -r a-^isin-o^ that speaker ha- not -adp a fal<:e <^tart. or 

any ne of a • nmbor -f thf>» po^ q; ^ b i 1 i t 1 whirl* tho gr^'mmar • •uld pvrnf 
aHy mlF out an>way on tl>e bno;!- of -s * ih qu p t inf rma ' i on ) . The • 
t ur p hn i 1 (i 1 ng op- r a t i on w' i r J* i py c*" - ' f 1,.% 1 1 • 1 ; > 

^Tit^nc^" roiitWio r ictnrod i u T i gu • p 1 

Ti>r <ir<^r arc in^thnt rMit^ne r rrn poricN t" a nou^' phra<;p sub rout i-*- 
which is o-panHod Figu»r \ T or th^ c,,,bjprt -f t^^e de laraMve sen- 

tence, the prammar stbrp" i t\ its m^moty register the fact 'hat the first 
word in the string boing pjr^ce'^^cd is t»'e <lpt '^rm i p'-r the. Providf^d tbi^ 
analysis is c^^rre- t so fa*, rho grammar k^ows th^t several subsoqtient 
pos^ ibi 1 it ie*^ nre likel^. The doterminer may be followed bv an intensi 
fier or stri^^g of tliem. moHifying an odj'-ctive or string of them, modi 
fying a >>ead noun. Or, it ^^lay simplv bo followed by a head n'^un. Tn 
terms of the siih - rout i ne in Figure 2, the grammar has alreadv taken the 
transition labeled ''Dot'' to arrive at st^itc qj. The grammar ant i c i pai-**«i 
a >^oun to follow which i'^ repTes^nted in ti>e transition to state , 

Tn soannine ^hn Ti#>yt word it di<?-.>xo',^ little ^h\ - J^}^,^y^ r \ , ^^^jnonr-e 
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is segmented from the phonetic sequences of [li?lb], [lirlbo], fliflboi], 
etc. This is accomplished by virtue of the fact that the word little is* 
the first phonetic sequence that not only matches a lexical entry in the 
grammar, but also is an adjective which allows a tTan?;ition in the gram- 
mar from statue qj to q^ , The grammar now anticipates either another 
* 
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adjective or the heaid noun to follow. Since little is semantically 
marked as a size indicator, the grammar anticipates a head noun that prag- 
matically maps onto a prerspecif ied physical object which has size and 
shape dimensions in some extralinguistic context. The next word scanned 
is boy, (Henceforth, reference to phonetic segmentation is omitted.) 
Since boy is a noun, it allows the transition from state qc to q^. Since^ 

is a final state for the noun phrase sub-routine (a? indicated by th^ 
slash and subscript 1 in Figure 2), the grammar may now return processing 
to the declarative sentence routine provided that no prepositional phrase 
follows the presumed subject noun phrase. Since the next word is was , 
which is lexically classified as a form of a stative verb be, . the grammar 
"pops" out of the NP sub-routine and returns control to the declarative 
* sentence routine, where it starts processing the predicate (see Figure 1). 
^o far, it has recognized and interpreted the noun phrase the little 
boy , which it has tentatively parsed and interpreted semantically and 
pragmatically. The parsing is equivalent to an incomplete tree structure 
as shown below. In addition, the grammar has pragmatically mapped the 
subject NP onto some r^^ferent which is known to be a meTn>ipr 




of the set •^^f hoys ar^d the snb set cf littl*" h'^ys. On the basis of the 
declai'at i sentence generator in Ptgnre 1. it »»'iw aT» 1 1 c i t a predira 
whic*^ wiM tell som'^thing a^'out the littl- hoy. T^«~ ki'uls of pre-Ht^a* 
that are '«kely are limit '-y <^r^ tactic, sem^nt^c. anri rrn^mati^ c '» 
s t ra in ' s .^y " t a- t i r » f 1 n , on the ha i of the t r ' n i t on ne t wor of 
rigtjre 1. t'»f> 'T '"av ol'owed hv a tr-M^^^it ve vf>rbb^ taMr^g t^e tr*'- 

s i t i '*n T o"» to 1 . or i t r an i t i v e v * r h by • ?iV i Tig t f • r an ^ • t i -mi 

f r Om f J J to n 2 • ^ T }i f> n p ■ t • o t ■ 1 ■ c ;uwi • * • ) t * c ' a a r ' ^ w s , ^h • c h 
a 1 1 o «^ a t r • n i t i or< ' r (>m r j ^ to ^ . T ' ' i • s :\ f ti a 1 ^ ^ a t o . hx^t " \ }^r ^ t *w 
word bucked foil ow<^ , t i. p p ^ a mm ■ t 'op^ no » 'P'p ( i ^ t f» r'" i Tin * p pr or e*^ 
^i»»g in t**'^* routine) «50-n a«^ bucked off pyo'p«7*;^H a»»cl r <*ogTn2ed 

as '> pa<^t p'^rticiple for^'^ of a tr'Mi«^«tive ^pr' , thp a"»ma t l»a*= speoif'P'l 
the pa^'^'ve con*^ t rue t i a^^d that -mi age»'t of a certain tyre ^ I'kely t 
f o I 1 o^-j . ' T a gma t i ca l 1 y t h ' ^ r a mm'* r kTi'"'" tlint 'er»ain f'Mir l'^Ege<l h'^a^'s 
which are ' idfleri by b^'man beings buck- Menc* . ti a^'iit i exp'^'cted i i> '» 
suh*^eqo*^ri t plira'-e, and 'ertain properties of th'^t agent ar<^ anti,ci 
pated. The eff'^ct of th i ar* t i - i pa t i on wiM be t^* facilitate prores<^ing 
if correct, and to hin^-ler it if incorrect. Wanat fl97i) r'-ported thnt 
a prepo*^ i t i ona 1 phrase with by. su'^h as b^' the barn the «^<^nten^e. the 
boy wa s bucked of f by the barn wa^ more difficyjTt to pr'^ce<^s than a by 
phrase such as by t he pony in the sentence, tHe bo y wa s bucked off 
the p ony . Not only Is the b^-phrase expec^ed to refer t<^ an agent, but 
it is pragmf^t i c^ 1 1 y constrained to^p>r,tion an age^'t of a 'ert'Mn sort. 
The rest of tlip nna]\<;ifz of ^ontr»iw*j^ ^ prfipvr^ctr-or tiir^Mph V>y ploo-:^ 
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and concludes with the processing of the agent noun phrase much the way 
the subject noun phrase lyas processed. When processing concludes, the 
grammar pops from state q^^ of Fi^re 1. 

Sentence b is analyzed in a similar way right up to the past parti- 
ciple, which in b is gone qyay . The possibilities that exist at that 
point, namely the ones that^are specified by the grammar, are that the 
sentence will terminate, or that it will be followed by a modifier in 
tKe form of a prepositional phrase, or other ad^j^erbial or string o£ 
adverbials. Whatever follows if the sentence does not terminate is not 
constrained in the way that the agent in the by-^phrase of the passive 
was constrained for sentence a. The modifier that is subsequent to the 
verb in sentence 2? may be a locative such as to his mother's ; it may be 
a time adverbial like by the next day ; a manner adverbial like quickly ; 
or an instrumental adverbial like on a horse ; etc. 

These facts which are conveniently represented in an expectancy 
grammar explain the data referred to earlier in this paper concerning 
the differential proce'Ssing of active and passive sentences. Although 
^ the deliberately oversimplified examples that we worked through to this 
' point have attempted tp account for language perception, complications 
. .of the basic notions illustrated apply equally to perception and?produc- 
tion data. For example, combinations of perception and production,' as 
in reading aloud, taking dictation, or having a conversation, all 
require an expectancy system of tha sort illustrated. For this. reason 
it is particularly well-equipped to serve as a framework within which 
the problems and data of language testing can be discussed. 

Another useful Extension of a theory of expectancy grammar is sug- 
gested by the research of Watzlawick et al , (1967). Their work stresses 
the fact that human communicative behavior has at least tvfo aspects: 
there is a relationship (affective) aspect of messages concerning how 
people see each other as people, nnd there is a content (cognitive) 
aspect encompassing the coding of factual information in the everyday 
sense (see the remarlcs above id the section on historical perspective, 
pp. 41-4:^). To oversimplify a bit, factors that pertain to the rel?» 
tionship aspect* of communication re-, bas i ca 1 1 y the are^ of interest 
encountenaTiced by sor-iol ingiii sf s ; factors that pertain to th*^' content 
aspect,, on th'> other haTid . are cha rac t er i s M ca 1 1 y the domai'* of intere^^t 
of logicians, cogni^^ive psyc>iol og i t s . and ps) cho H ngn i s t s . Relation- 
ship information is nor'nally coded what we t«^rm pa^ a 1 i ngii i s t i c ">«ri> 
anisms, whereas - ontent i ti forma t i on is normn i 1\ i-o^'l^fl in *:^pTnen t n i 
T>>i<^neme*^ , words, phrases, "^ent^nces, <^tc . 

TTie reses^rch of Ogston and London (1971) s>iows an intere'^ting conne*- 
tion betwpf>'^ the para 1 i ngn i s t i c me*^hanisms and <^ontent level mechanisms 
of coding. TTiey demonstrated t'^at "as n normal person speaks, his bo3y 
'dance*;* in precise and ordered cadenre with the speec>i as it i articti 
lated. T>ie body moves in patterns of change which are directly proper 
tion^al to the articulated pattern of the speech stream (p. IS*^)." And 
what is perhaps still more interesting: "A hearer *s body was found to 
'dance' in precise harmony with the speaker. When the units of change 
in their behavior are segmented and displayed consecutively, the speaker 
and hearer looly like puppets moved by the same set of strings (p. 158)." 
In commenting on the research of Ogston and Condon; Lenneherg (^1971) 
observed that it is apparently th^ case "that the flow of movements that 
constitute motor behavior consists of 'chunks' each having a peculiar 
• program" of nervous integration (p 175)." Lenneherg goes on to observe 
that. 
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the sequences of behav^oi) in animals and man are, under normal 
conditions^ extremely flexible. As the organism moves from situ- 
ation to situation, the patterns of sequences are constantly — 
readjusted to fit specific demands; the only common denominator 
that remains between one motor sequence, for example > one epi-- 
sode of catching prey, and the next is a logical principle or, 
in other words, a generalized pattern. If the individual associ- 
ates of change of neuromuscular events had to be stored one by 
one, it is difficult to see how and when the organism would have 
time to acquire the unique behavioral changes as they occur on 
one particular occasion, and how instantaneous transformations, 
which adjust behavior to the imperatives of the moment could be 
performed without a new trial and error procedure (p. 177). 

All of th||^ suggests that the organism possesses a complicated gener- 
ative mechanism or hierarchy of programs that determines the behavior 
appropriate to a given situation. An expectancy grammar seems to be a 
natural mechanism for explaining these facts. Such a graimnar seems not 
only to underlie langulige behavior in particular, but human behavior in 
general. That is to say, the htiman being has internalized a grammar of 
expectancy which enables him to generate (out of a rich repertoire of 
"graimnatical" routines and sub-routines) unique models to fit particular 
situations. I have suggested elsewhere that it is reasonable to assume 
that such grammatical programs themselves are generated a^id constantly 
modified by certain principles of learning (Oiler, 1971a, l'972c, 1974b, 
and forthcoming) . 

CONNECTIONS WITH LANGUAGE TESTING 

Within the context of expectancy grammars as models of underlying compe- 
tence, a valid language test* aan be defined as one that activates the 
expectancy; grammar that the learner has internalized . The extent to which 
the learner's grammar is able to synthesize and analyze meaningful 
sequences of elements in the language is an index of his proficiency or 
competence in the language. A great d*^al of data from research in second 
language testing, in particular, shows that some kinds of tests are bett'^^ 
than others at activating the learner's internalized expectnncy grammar 
The arguments consid*»red in thic section originated largely in secoT>d 
language profic^encv research. How<:>ver, it ^houl<1 be horne in mind that 
the COnclusioT^c ger^py 1 i ynt- i one from t-]^f> Aut^ hsivp mur-h wiH«r appli- 

cahi lity . ^ 

Among the tests that ^^ppear to pr^^vide vaJ i'^ information about Ian 
guage proficiency are the trfiditional di<-tatix)n and the more recently 
popularized cloze procedure. In various forms, these two and other inte- 
grative tests have been advocated by Carroll (1961), Valette (1964, 1967). 
Spolsky et al . (1968), Oiler (1970a; 1973b; forthcoming and references 
there), Johansson (1974), Angelis (1974), Upshur (1972), Upshur and Palmer 
(1974), Gradman (1973), Stubbs and Tucker (1974), anci many others. One 
of the indications of the validity of such tests is their strong inter- 
correlation with each other. A cloze test is based on a visual input that 
is read by the examinee, while a dictation is based on an auditory input 
that is heard by the examinee; nevertheless, they tend to correlate at 
near the .90 level. This means that roughly 81% of the variance on the 
tests is common variance. Similar result's have been achieved with tests 
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of reading and speaking skills (Oiler and Perkins, forthcoming, and Appen- 
dix to Oiler, forthcoming). 

While the typical interpretation of such data by experts prior to the 
1970' s was that it only indicated test reliability (and more recently 
Rand* 1972, and Educational Testing Service, 1970), there is reason to 
believe that such data can only be interpreted as an indication of test 
validity. In particular, such results show that the two types of tests 
are probably tapping^ a common underlying skill- 'The notion of an f'expec- * 
tancy grammar" offers a sound theoretical basis for explaining the' overlap 

Figure 4 illustrates the facts that have been observed ijx a wide 
V^ariety of integrative tests, 'especially the subclass of pragmatic lan- 
guage tests. The areas of the various circles can be taken as rough rep- 
resentations, of the variances on different pragmatic tests - involving 
listening, speaking, reading, and writing. The overlap between the 
circles may be taken as evidence of a basic expectancy grammar that is 
tapped by all of the tests. While the areas of non-overlap, i.e. the 
shaded portions of the figure, may perhaps be attributed either to rela- 
tively superficial differences in peripheral processing mechanisms (such 
as hearing and seeing), or to unreliability, or to both, in the cases of 
individual subjects where the correlation may be practically nil, an 
explanation can be provided on the basis of disorders in the peripheral 
processing mechanisms, unusual experiential background (such as only 
having experienced the language in written form), and the like. For 
example, a person may be weak in the ability to read English script even 
though he understands spoken English very well. Similarly, a learner may 
have acquired considerable skill in deciphering the written form of the 
language and be quite inept at ^understanding its spoken form. 

Some of the tests which qualify as belonging to the family of prag 
matic tests, in addition to those we have already noted, include the 
Foreign Service Institute's Oral Interview, many reading tasks, es^ay 
writing, and a great many other commun ic^it i on task*=/' In general, thes^ 
tests have remarkable chfiracterist ^ cs of stability and sensitivity. To 
illustrate <;ome of their practi*^al and ^heor*=*t ica 1 virtues, we will 
ro^ lew only a couple of sample? '^f <Hata from dictatioTi nnd cloze tests. 

A Hictafio'> of the sort th^t i <; rea'^onably termed ^ pr^Pmati'^ test 
is OT^e that is administered at f\ nor"5al c^Mi v<=*r ? a M ona 1 iatf=* '^ver ^egme^^^^'* 
that chaMeTige tbe ^hort term memory span of the e^ami^ees. Ui i s kind, 
contrary t^-* much of what th*- "^^xperts" soid dtir»ng Mie iPSO's and 1 ^^60 ' 
has prov<=^d repeatedlv to be exceMent devi'-^ for 'he me'^ sur *^men t of 
language proficiency (Ol^er, 1970?,; .Tohansson, 107 1; OMer and ''^treiff, 
1975), ai^d it also wo^ks well a*^ an elicitation d^^vi^e f '^r data concerni*'**" 
spetifi'- deficiencies i ri the i t erna 1 i ze<^ grammar of the second langupp'* 
learner (Angelis, 19"4). On a dictation as a plob^i prof l c i *=>ncy tes'. 
the examinee's score is determined by counting th<^ number of deleted 
words* extraneotic insert ion^. Tovpr^nl^ of otHot. nrw? y^>w>rio l og i r n l l \ TTin t i 
1 a t ed ent r i es . 

Of course, more specific "achievement" tests may be constructed by 
salting the passage with particular sorts of phonologi c?^ 1 , morphological, 
lexical, syntactic, semantic, or pragmatic exemplars of rule application^ 
Many other uses for this integrative testing technique, and the others 
mentioned earlier, are not difficult to imagine. An important point to 
remember in the construction of any such test is that the language it 
represents shdlild be characteristic of the kinds of situations and^style<: 
of speech or writing the examinee is apt to encoi.int*^T in tl^p "real'^ upe 



64 



54 



AppJ^acbGS to Lstngustg^ Testing 



cloze test 
dictation 
reading test 
or a 1 in t erv i ew 



o ra 1 i n t erv i ew 




ERIC 



Figure 4. Variance overlap on integrative tests as an indi 
cation of an underlying grammar of expectancy. 
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of the language skills that the test s^k^ to measure. This is just 
another Way of saying thafe the activities required by the t6st, to be'' 
valid, must resemble (at a deep level) as accurately as possible the real 
life activities that they try to predict the examinee's skill in. 

A cloze test is constructed by deleting words from a passage of If^se 
The task set the exaiginee is to rej^ace, the missing items. - Probabl/^^e 
simplest and most frequent lyus:^^" method for constructing a clor^ test in 
order to achieve a global/ ^imate of language skill is to delete every 
si^th or seventh word from a passage of Srose of 300 to 350'words in 
length. The scoring of the test may begone in several ways. The least 
comfilicated method is to count the^numbfer of words that are restored 
exactly as they were placed in the original text. ' Although this method 
works about as well as any other in determining the readability of a text 
for native speakers, either one. of two other methods will probably work 
better in the measurement of non-native proficiency. Darnell' (19681 
recommends a method based on a fairly ' sophist icated comparison of the 
non-native's choice,;On each item against a response frequency analysis 
for native speakei^s . The drawback to this scoring technique is its 
considerable complexity. Having native speakers make judgments of 
acceptability actually^ works sligghtly better than the exact-word scoring 
method, and is simpler to use than methods requiring response frequency 
analysis (Darnell, 1968). As in the case of dictation, cloze tests also 
are adaptable to many different testing purposes. Again, their theoreti- 
cal claim to validity resides in the fact that they activate the inter- 
nalized expectancy grammar of the examinee.'^ 

Pragmatic tests, such as cloze procedure and dictation, have not been 
entirely wanting for criticism. It has been argued (Rand, 1972) that 
thex^do not provide specific enough information on the precise points of 
grammar where the examinee may be deficient, whereas discrete-point tests 
are especially designed to do so. Tt seems, however, that this critirjc,^ 
IS not well founded Tn general', pragmatic tests are muqh better 
equipped to provide diagnostic inf-orm^it ion than discrete-point tests 
because pragmatic test^ elicit eri^ors in contexts wb^re the dyn'mic 
aspects of grRmm^r nr^ op*:*r^itivo ( rf . 1071t anH 0 11 f^r . fnrth 

com i ngl . 

Tt has also been argued that pmgmatic tests do n'^t clearly differenti 
ate non-native students who can arul ^-annot siu'cee^^ in 1 1 ege -1 eve l conrc,- 
work (Rand, TTiis nrgument is >>ased '^n tl^e fMso assumption that 

a language test by itself '-an predict wh ^ ch foreiei. students w-i i i suc^-^-' 
in a college level ^pr ^ony other) cotir^^e of cudy. Tf ^angu-'ge sV i 1 1 
wer«=* all th-^t were required, no fJn^Msr n-^t^ivo ^|w.r>Vr-r =:iw>nlr! ox f n n 
nnrl thi<=:,is simply not the case. 

Yet another criticism is^tl^at integrative test*^ are "too vapue." that 
one cannot tell precisely v^;hat they measnre. T>i i s , however, i no fault 
Q,f the tests btit is rather charar t er i s t i r of the ski !!(<;) they s^ek to - 
measure. Answering the question, "Mow much language profirieiury is neces- 
sary to understand what goe<: on in a college J eve 1 course" is like answf^^r- 
ing^he question, "How much light is sufficient to ^^In^.^iv way out of 
a forest?"^ It depends hardly at all on any particular discrete "rays" 
of light, and the same sort of tl^^ing can be said about ''points" of Inn 
guage skill. It just happens. to be the case, moreover, that pragmatic 
tests are better suited to the mapping measnr-ement of langua^p profioif-ncy 
(albeit in a vague way) than discrete-point tests are. 

Probably, expect^^ncy grammars are as resistant to nin t i l at i ons of vari^ 
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ous sorts as is the use of language itself. We can frequently say what •* 
we do not meao and still be correctly understood. Or, w6 may mean what 
we do not say and still be understood. The well-known and now classic 
experiments of Miller et al , (1951) for Bell Laboratories are evidence 
enough of the fact. that the speech signal itself may be quite badly dis- 
torted and still be understood readily by a listener. - The concept^ of a 
pragmat^««rily -motivated expectancy grammar suggests a straightforward 
expIansVion for this. Similarly, a person who scarcely ^utters a word 
that is olI early understandable in isolation may be understood easily when 
hiis communications are dealt with in their extralinguistic and linguistic 
contexts. Thus, pragmatic tests sample the non-native speaker's ability 
to do what native [speakers do in the normal: use of language/ 

^SUMMARY • 

The -study of pragmatics as it^ relates generally to linguistic theory and 
applied linguistics, and particularly to language testing, has been con- 
sidered. Although the pragmatic approach may represent a change in the 
emphasis of current research, it is rooted ^in a long history of concern 
for , the riSeaning-oriented aspects of language use and learning. The trans- 
formational tradition has largely ignored the pragmatic nature of lahguage 
until quite recently, Jjut now seems to be moving rapidly in the direction 
of a pragmatically-motivated theory. The major premise of pragmatics that 
a whole is greater than the sum of its parts has important consequences 
for linguistics, applied linguistics, and' language testing. Nor is.H: a 
new idea. The notion of an "expectancy grammar" is employed as a basis 
for explaining certain psycholinguistic facts as well as data from Ian-- 
guage testing research- P^ra^atic tests, it is claimed, are superior to 
the discrete-point type in that they tap the Mnd*»r lying, internalized 
expectancy erammar of thp ex^^minee. 

FOOTNOTFS 

^This articlf* incorpprates and expands material presented in three earli'^r 
papers. The fir<=t» "Pragmatic I.anguage Testing: A Theory for Use and a 
Use for The'^^ry," was an invited lecture pr*^sented at Tnd i ana JJni versi ty 
in January, 19.73 at a m*^eting ?>ponsored by the C^"imitt-ee on Rpse^^rch in 
Educational Development and language Ti^st mction fA ver«:ion of that 
lecture appeared in r^angnagG Saienc^^s, 28, 19*^3, pp. 7 12-) T\\r s<=*con^^ 
lecture, '=*ntitled 'Tr agmnt i cs , " wa«^ presented '>t th*:* University of New 
Mexico in April, 1973 at a meeting of the r>ike Cit\' I.ingtii <^tics Circl** 
The third lecture, which had the '^ame ♦'itle as th«=* present paper, was 
given in San Juan, .Puerto Rico in May, 1973 at an international *>eminar 
on Language Testing, Jointly sponsored by the AILA Ccr^missian on Language 
Tests and Testing and the Or gan i t- i on of l^?3r}>ers of rngli<:h to Speakers 
of Other Languages (TESOL) . 

^The relevant works of any one of the four men mentioned here would 
constitute an impressive bibliography! In connecti.on with the notiOHr- 
**pragmatics" and its relation to the philosophy of pragmatism, see Hayden 
and Alworth (196S) and the selections in it by Pierce, James, and Dewey. 
For one of the major works on the topic, see Morris (1938). A thorough 
history of the topic would take u-s too far off course. t 

^For instance, see Oiler and Perkins (1978) for research demonstrating 
the relevance of expectancy grammar to first language and bilingual con- 
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texts. (See also Pa^t'VI of Oiler and Perkins, forthcoming.) Apparently, 
language proficiency in the sense discussed in this paper is the key fac- 
tor in a very wide Bange .of educational contexts and es.pecially tests . 
In fact, one can malce a case for the view that intelligence is intrinsi- 
cally tied to language prof:vciency in the sense defihed, and the tests 
aimed at the former may really only be pleasuring the latter, (see Qller, 

Pragmatic tests are defined (Oiler, forthcoming) ds tha^ class of 
integrative tests meeting two requirements': first,^ they must require the 
pragmatic mapping of utterances (or their surrogates) onto 'extr^linguistic 
Context. This can be termed the meaning requirement. Second, they must 
require the processing to tak^ place Qnder temporal constraints. This 
may Be termed the ^iine requirement . Integrative tests, on the othePS 
hand, are a much broader class o£ te'sts defined as the antithesis of 
discrete-point tests. ' • . ► 

Sj^am indebted to the Idte Dr. Walton Geiger for this metaphor. . 
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